





picAL ROOM hes 
AGRARY 4} 


47/ HE 





i. 
VOL. XXVIII JANUARY, 1937 NO. 1 | 

af The Journal of Educational | 
: Psychology : 


Devoted Primarily to the Scientifie Study of Problems of Learning and Teaching 





CONTENTS 


V The Decreasing Accuracy of Scholastic Predictions . 
E. G. WILLIAMSON 





A Proposed Procedure for Increasing the Efficiency of Objective Tests. 17 
JOHN C. FLANAGAN | 


Immediate Quality: A Factor in the Application of Psychology . . . 22 
| B. 0. SMITH 
| Reliability of Telebinocular Tests of Beginning Pupils. ..... . 31 
ARTHUR I. GATES AND GUY L. BOND 
Ten Experiments on Whole and Part Learning. ......... 37. 
MILTON B, JENSEN AND AGNES LEMAIRE 
The Variation in Pattern of Factor Loadings. .......... 55 


RUSSELL C. SMART 


Various Techniques of Combining Ratings. ........... 65 
RUDYARD K. BENT 


Social Dominance of Clerical Workers and Sales-persons as Measured 
by the Bernreuter Personality Inventory. . . ......... 71 


ARTHUR F. DODGE 





$6.00 per Year - Published Monthly September to May 


WARWICK & YORK, INC. 
BALTIMORE, MD. 


Entered as Second Class Matter Nov. 15, 1921, at the Post Office at Baltimore, Md. 
under the Act of March 3, 1879; additional entry as Second Class Matter at York, Pa. 





a 
RO yg MO MEE wy — 





THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Established 1910 
EDITORS 
Jacx W. Duntap 
Fordham University 
Harotp E, Jones Percivat M. Symonp 
University of California Teachers College, 


Columbia University 
H. E. Bucnuoiz, Managing Editor 





tigen price of the Journal is $6.00 a year in the United States; $6.40 in foreigr 
countries. Part-year subscriptions are 90 cents for each number ordered. Back 
volumes are $7.00 each; back issues are $1.10 each except when more than five yea 
old, and then $1.20 each. : 

Subscribers should notify the publishers of change in address at least four wee 
before publication of the issue with which the change is to take effect. Claims fo 
non-receipt of an issue will not be honored unless made within two weeks after 
receipt of the next succeeding number. 

nsolicited manuscripts should be accompanied with return postage. Man 
scripts, books and other materials for review, and correspondence regarding editori 
and business matters should be addressed to the Publishers. 


WARWICK AND YORK - Pudlishrs - BALTIMORE, MD 




















THE ACTIVITY MOVEMENT 


By Criype Hissonc 


In an attempt to overcome the weaknesses of the traditional school 
organization many progressive schools have developed new programs. 
These programs are so similar in character that collectively the 
changes have been referred to as the activity movement. This 
movement has claimed the center of the educational stage for a length 
of time sufficient to have engendered widespread interest in its out- 
comes and in its basic philosophy. 

In Doctor Hissong’s study an attempt has been made to discover 
the principles underlying the present activity movement, to determine 
the influence of traditional concepts in shaping the trends of the 
movement, and to see if in the light of the present knowledge of the 
child and his relation to his environment the movement rests upon a 


justifiable basis. 
$2.00 plus 10¢ postage 


WARWICK AND YORK 






Publishers BALTIMORE 








Wd 


MOND 


foreig: 


|. Back 


e yea 


P week 


ims for 


8 alte 


Mant 
ditori 


MD 


tt —- @& we 


r 
c 
e 
c 
a 





THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 





—-— 
ae 


Volume XXVIII January, 1937 Number 1 


SonnREREEEREREEEE EERIE 
a 














——— 
— —-— 





THE DECREASING ACCURACY OF SCHOLASTIC 
PREDICTIONS 


E. G. WILLIAMSON 


University of Minnesota 


One function of personnel research in higher education is to perfect 
instruments for measuring those variables which have a high degree of 
relationship with college grades, to the end that there may be a mini- 
mum of error in predicting individual achievement. The personnel 
use of such valid measuring instruments permits the early identification 
of students whose subsequent scholarship will be low or unfavorable. 
Such procedures provide a basis for administrative action to decrease 
scholastic mortality by identifying probable failing students and 
directing 1 them into types of academic or non-academic training more 
in line with their abilities. The diverting of such low-aptitude students 
should make it unnecessary to eliminate such students after registra- 
tion in college by the-very prevalent method of “flunking.” Thus 
personne! procedures should permit the maintenance of high scholastic 
standards or even the raising and wider application of those standards, 
thereby both producing students better trained for occupational 
success and increasing the service of the college to society by gradu- 
ating only high-grade professional men and women. 

Personnel research, therefore, perfects instruments for the selection 
and identification of students with given probabilities of scholastic 
success and personnel counseling makes use of these instruments in 
dealing with individual students. Thus far little attention has been 
given to the counseling of high-aptitude students for whom predictions 
of success are encouraging, but many of whom fail to achieve con- 





sistently with this high prediction because of distracting factors which’ 


interfere with the realization of potentialities. The major attention 

in counseling has been given to low-aptitude students because of the 

belief that, with the marked increase in enrollment in colleges subse- 
1 

















2 The Journal of Educational Psychology 


quent to the World War, the proportion of low-aptitude students 
coming to college has greatly increased. This impression was not 
tested experimentally until a decade or more after it was first enun- 
ciated. Research at the University of Minnesota failed to verify this 
belief.! 


TaBLE I.—CHANGEs IN Mean Score, Sigma, AND CORRELATION WITH SCHOLAR- 
SHIP OF THE 1926 Form or CoLLEGE ApTITUDE TEST 




















Men Women Total 
Mean oe pooh Mean sro ‘onal Mean — — 
Year N | total we é ¥ r | N | total _ d “1 re | Ni total] P& r 
centile} raw centile; raw centile} raw 
score score score 
rank* | score rank* | score rank* | score 
1926 444/243 .6 50 60.0 |.51/348/253.6) 56 55.4 |.48/792/248.0) 53 58.3 |.50 
1928 /|552/250.5| 54 60.6 |.38/441/259.4) 60 55.9 |.55)993/254.5) 56 58.7 |.45 
1934 /|502/279.6| 72 49.4 |.31|449|278.3) 72 48.4 |.43|/951/278.9|) 72 48.9 |.39 
1935 (|454/283.5) 74 52.0 |.38|373|278.1| 72 51.9 |.50)827/281.0) 73 52.1 |.43 


















































* Percentile norms based on all Arts college freshmen enrolling in September 1926. 


DECREASING CORRELATION WITH COLLEGE SCHOLARSHIP 


Since the success of any personnel or counseling system is dependent 
upon the accuracy of its predictive devices in forecasting student 
achievement, it is well to ask what success has been achieved in 
increasing the accuracy of such prediction.? The following table 
shows that over a period of years use of the same tests, and of high- 
school scholarship, has resulted in constantly decreasing correlations 
with scholarship in the College of Science, Literature and the Arts in 
the University of Minnesota. Unfortunately, the same form of test 
was not used for the entire period of years 1929 to 1935 inclusive. This 
situation makes analysis difficult, but the period was selected because 
of major educational and administrative changes beginning in 1929. 
The Minnesota college aptitude test shows decreasing correlations with 
scholarship, between 1926 and 1935, from a magnitude of .50 to .43 
for all freshmen (Table I) for one form of the test, and from .40 to 





1 Williamson, E. G.: ‘‘Changes in College Freshman Intelligence,” School 
and Society, Vol. XLII, October 19, 1935, pp. 547-551. 

2 We are concerned in this paper only with the use of tests for prediction and 
not with other uses such as comparison of groups by means of standard tests, 
studies of changes in group or individual test scores, and the use of tests in equating 
control groups in educational research. 





The Decreasing Accuracy of Scholastic Prediction 3 


.25, between 1929 and 1935, for another form (Table II).' A later 
section of this paper will discuss the trends for men and women con- 
sidered separately. During the period 1926 to 1935 high-school 
scholarship percentile rank shows a correlation with college grades 
decreasing only slightly from .57 in 1926 except for the coefficient of 
.48 in 1934 (Table III). 


TaBiLe II.—Cuanees tn Mean Score, Sigma, aND CORRELATION WITH SCHOLAR- 
sHip oF Form AMCN or Cotuzcs Aprirupe Trgst 























Men Women Total 
Mean| SD Mean| SD Mean; 8D 
Mean Mean Mean 
Year | N|total| Per | tal! . | wltotar| Pe | tell . | wl totai| Pere | total] , 
nonce centile} raw boas centile} raw aoien centile} raw 
rank* | score rank*| score rank* | score 
1929 /|465/302.6) 41 65.6 |.40/415/316.3) 49 53.3 |.37/880/309.1) 45 60.5 |.40 
1930 /447/302.5) 41 62.6 |.33)416/312.3) 47 49.7 |.47|863/307.2) 44 56.9 |.39 
1933 352|327.5| 55 52.3 |.30 3921327 .6| 55 47.4 |.391744/327.6) 55 49.8 |.34 
1934 502/321.2} 52 53.5 |.17|449}322.4) 52 51.0 |.33/951/321. 52 52.4 |.24 
1935 /|454/332.6) 52 52.9 |.19/373/331.6) 59 49.7 |.36/827|332.2) 60 51.5 |.25 


















































* Percentile norms based on all Arts college freshmen enrolling in September 1929. 


INCREASED HOMOGENEITY OF APTITUDE 


Since high-school scholarship (percentile rank of grades for four 
years of each student in his senior class) is based upon the performance 
of differing classes of students, one would not expect this device neces- 
sarily to maintain the same magnitude of relationship. But, since the 
college aptitude test is identical in each comparison (two different 
forms being used in two separate comparisons) for these years, it seems 
advisable to seek the reason for this decreasing accuracy of prediction 
of student scholarship. Have changes in the college situation brought 





1In recent correspondence, Ben Wood reports similar results for Columbia 
College. He states: ‘‘ . . . from 1919 to about 1923 the correlation between the 
Thorndike intelligence test and college grades averaged around .60 to .65, whereas 
since that time the correlations have averaged around .40 to .45. Since about 
1924 we have used the Thorndike intelligence scores and the placement test scores 
in promoting bright pupils and have permitted freshmen in the lowest five or ten 
per cent to take only- the minimal hours of work. The pupils at the top are 
allowed, if not encouraged, to overload themselves with too many courses and 
outside activities. I think this procedure is entirely justified educationally, but 
it tends to lower our correlation coefficients.” 








2 > MO me 
pelere a ates sat nemene pares 








4 The Journal of Educational Psychology 


about this decreasing accuracy of forecasting efficiency? Or what 
other possible causes are there for this decrease? In answer to these 
questions a number of possible causes will be reviewed. Upon their 
significance as causes of these decreasing correlations we cannot yet 
pass definitive judgment. But it is most likely that no single factor, 
but the interrelation of a number of factors is operative in this situa- 
tion, the beneficial effects of some factors being offset by the detri- 
mental effect of others. 

The data in Tables I, II, and III show an increased mean and a 
decreased standard deviation of the distributions for the two forms of 
the test, as well as for high-school scholarship. When we use the 1926 
percentile norms as a standard of comparison for one form of the test, 
we find that the mean percentile for freshmen studied has increased 
from 53 in 1926 to 73 in 1935. The standard deviation of the raw 
score has decreased from 58.3 to 52.1. On form AMCN the mean 
percentile rank has increased from 45 in 1929 to 60 in 1935 with the 
standard deviations of the raw scores being 60.48 in 1929 and 51.53 in 
1935. The trend in means and sigmas of high-school scholarship are 
similar to those of the two tests, being 62.4 mean percentile rank in 
1926 and 73.0 in 1935. The sigmas for percentile ranks for these two 
years are 29.4 and 22.1 respectively. 


TaBLe III.—CuHances in M&AN PERCENTILE RANK, SIGMA, AND CORRELATION 
WITH CoLLEGE GRaDEs OF HiGH-scHooL SCHOLARSHIP 











Men Women Total 

Mean Mean Mean 

per- per- per- 
Year N centile 8D - N centile 8D a N centile 8D J 

rank* rank* rank* 
1926 444 | 54.2 | 29.5] .64 | 348 | 72.9 | 25.7 | .52 | 792 | 62.4 | 29.4] .57 
1928 552 | 54.7 | 29.4 | .52 | 441 | 73.5 | 25.1 | .60 | 993 | 63.0] 29.1 |! .56 
1929 465 | 53.4 | 28.8 | .51 | 415 | 72.6 | 24.7 | .53 | 880 | 62.4 | 28.6] .53 
1930 447 | 55.9 | 31.8 | .49 | 416 | 70.4 | 25.41 .59 | 863 | 62.9 | 28.5] .52 
1933 352 | 67.3 | 24.9 | .53 | 392 | 76.2 | 20.2 | .52 | 744 | 71.9 | 23.0)! .51 
1934 502 | 68.0 | 22.2 | .47 | 449 | 77.4 | 19.2 | .48 | 951 | 71.9 | 21.5] .48 
1935 454 | 69.5 | 23.2 | .52 | 373 | 77.4) 19.8] .53 | 827 | 73.0 | 22.11] .53 









































* Percentile rank based on the average grade in three and one-half years of high school of each 
senior class. Obviously these percentiles are computed separately each year for each senior class 
in a particular high school. 








The Decreasing Accuracy of Scholastic Prediction 5 


SEX DIFFERENCES IN INCREASED HOMOGENEITY 


The data in Table I reveal that, on the 1926 form of the test, the 
mean percentile of men freshmen increased slightly more than did the 
mean percentile of women freshmen, 50 to 74 for men and 56 to 72 for 
women. The sigma of the distribution of total raw scores decreased 
from 60.0 to 52.0 for men, and from 55.4 to 51.9 for women. The 
correlation with Fall-quarter scholarship for men decreased from .51 
to .38, but in 1928 the coefficient dropped to .38 even though the sigma 
remained the same as in 1926. But, for women, there were only minor 
fluctuations except for the increase in 1928 not paralleled by any 
change in sigma. The trend is similar for form AMCN summarized in 
Table II. The coefficients for men decreased from .40 to .19 but the 
coefficients for women remained relatively constant except for 1928. 

Similar sex differences are shown in Table III for high-school 
scholarship. The mean percentile for men increased from 54.2 in 
1926 to 69.5 in 1935 with the sigma of distribution decreasing from 
29.5 to 23.2. The mean for women increased from 72.9 to 77.4, or 
about one-third the increase formen. The sigma for women decreased 
from 25.7 to 19.8. The correlation with Fall-quarter scholarship for 
men remained approximately the same as did the coefficients for women 
except for the years 1928 and 1930. Clearly, in this case the increased 
coefficients of .60 and .59 for the women were not caused by greater 
heterogeneity in those years, since the sigmas for the latter two years 
are of about the same magnitude as in 1926. Moreover, changes in 
the sigmas for men were not paralleled by changes in the coefficients 
of correlation, the sigma for 1930 being larger but the coefficient lower 
than for 1926. The sigma for 1935 was lower but the coefficient as 
large as that for 1926. 


EFFECT OF HOMOGENEITY UPON CORRELATION WITH COLLEGE 
SCHOLARSHIP 


In case the criterion, college scholarship, has been modified in only 
a ‘‘consequential manner’! due to changes in range or homogeneity of 


ee ri2* 
roe should 


restore the values of r for the later years up to the magnitude of r for 





test scores, then application of the formula 





1 Kelley, T. L.: Statistical Methods, New York: The Macmillan Company, 
1924, pp. 224-225. 
* Keuiey, T. L.: Op. cit., pp. 224-225. 











ne 


= 7” a 
ee 
ee oe 
— 


& 
¢ 
: 


6 The Journal of Educational Psychology 


the earlier years. To what extent is restriction of range responsible 
for the decreased magnitude of the coefficients for the tests? Table 
IV shows for men and women considered together, that six of the nine 
coefficients are increased in magnitude by such a correction. But the 
coefficients for the 1926 form for the years 1928, 1934, and 1935 differ 
from the coefficient for the year 1926 to the extent of critical ratios of 
1.35, 1.60, and 1.04 respectively. The corresponding critical ratios for 
form AMCN for the years 1930, 1933, 1934 and 1935 each compared 
with the year 1926 are respectively —.03, —.07, 3.31, and 2.84. 


TaBLE IV.—CoRRECTION OF VALUES OF COEFFICIENTS OF CORRELATION FOR 
INCREASED HOMOGENEITY OF TOTAL FRESHMAN CLASSES 











1926 form AMCN form 
Uncorrected Corrected Uncorrected Corrected 
Year Year 
r r r r 

1926 .50 .50 1929 .40 .40 

1928 45 .45 1930 .39 .41 

1934 .39 44 1933 .34 .40 

1935 .43 46 1934 24 26 
1935 .25 .28 




















It is clear in the case of form AMCN that the restricted range or 
increased homogeneity has not been responsible for the decreased 
magnitude of the coefficients of correlation. , For form 1926, the 
critical ratios do not reach the desired magnitude of 2.0,! but they are 
consistent and of such magnitude as to indicate that probably the 
restricted range of the later years has not been the sole cause of the 
decreased coefficients, and that we are justified in searching for other 
and additional causes. As pointed out earlier, changes in the coeffi- 
cients for high-school scholarship are not paralleled by consistent 
changes in sigmas. Apparently unknown factors have operated to 
keep the magnitude of the coefficients stable despite changes in sigmas. 
Kelley’s correction for restricted range was not applied to high-school 
scholarship because the percentiles are relative and not absolute scores, 
being computed separately for each class and not on the basis of an 
absolute and constant standard as in the case of the two tests. 





1 Fisher, R. A.: Statistical Methods for Research Workers. Oliver and Boyd. 
Edinburgh, 1932, pp. 182-183. 





The Decreasing Accuracy of Scholastic Prediction 7 


CHANGES IN EDUCATIONAL AND ADMINISTRATIVE PROCEDURES 


Certain changes in educational and personnel practices and in 
grading may provide at least partial explanations of these decreasing 
coefficients. These factors may have made it increasingly difficult to 
maintain accuracy of prediction. Wood stated many years ago that 
as soon as administrators should begin to make changes in educational 
procedure and practice in line with personnel policies and practice, 
then we might expect decreases in accuracy of prediction. In order 
to maintain the original forecasting efficiency it is necessary that the 
conditions of the original experiment be maintained rigidly. If the 
results of personnel research are not translated into educational and 
personnel practice, then we might expect to maintain the same level 
of accuracy of prediction. For example, having first determined that 
low-aptitude students fail, if we continued a laissez faire policy of 
permitting such students to select the most difficult subjects, then our 
predictions should maintain approximately the same degree of accuracy. 
But, on the other hand, it is the purpose of personnel practice to upset 
this prediction in that the counselor seeks to divert students from 
courses which they would fail, into courses which they may pass. 
Such a practice, of course, if widespread, would lower the correlation, 
since it artificially raises the student’s grade level. Johnston has 
reported an experiment of an educational change which has had this 
effect.1 During the period 1930 to 1932, low-aptitude students were 
not permitted to take foreign languages and laboratory science courses 
which formerly had been the cause of failure of such students. Such 
students were required to enroll only in survey lecture courses which, 
in many cases, they passed, thus displacing their scholastic rank. 
Many students who, formerly, would have failed more difficult courses 
now maintain at least an average grade of C in these courses which 
are more in line with their academic aptitudes. Such a practice 
actually prevents many failures and is consistent with the policy and 
goal of personnel counseling, but it does work havoc with forecasting 
efficiency and with routine methods of personnel research. 

We may mention other practices, introduced since 1926 in the 
University of Minnesota, and particularly in the Arts college, which 
may be additional possible disturbing factors in the predictive effi- 
ciency of tests and other variables. The development of a system of 





1 Johnston, J. B.: University of Minnesota Bulletin, Vol. XXXV, No. 64, pp. 
103-108. 





<2 


i 
no 
te 


=< 


——=—>_ 








a 





—— — 


: 
“4 
? 
1% 


8 The Journal of Educational Psychology 


faculty counseling, if at all efficient, may guide students away from 
failures, by motivating them more nearly to capacity and if possible 
above their predicted capacity, and especially by the selection of 
courses in line with interests and aptitudes. 

Curricular reorganizations introduce still other changes in the 
criterion. The exemption of high-aptitude students from basic 
courses such as English and mathematics, may result in such students 
receiving C or B grades in a more difficult course whereas in an easier 
course they might have received a grade of A. In other cases high- 
aptitude students may receive a C or D grade in a course already 
mastered in high school, whereas, if advanced to new and more interest- 
ing courses, they may be stimulated to work up to their superior 
capacity. The instituting of subfreshman courses, actually prevents 
many failures by thus adjusting the content of the course to the needs, 
capacity, and knowledge of students. Placement examining prior to 
instruction in the elementary course in a department is now rather 
widespread and includes English, mathematics, German, French, and 
Spanish. Students who formerly would have failed more difficult 
and advanced courses for which they were not prepared, now are 
placed in easier courses in line with their present status of intellectual 
development and thus have greater possibility of getting passing 
grades. This practice may in turn lower the correlation and upset the 
prediction, since it violates the uniform conditions of the original 
predictive studies, namely that students shall be permitted to take 
courses regardless of their probability of succeeding in them. 

The organization of the University Testing Bureau in 1932 was 
still another administrative change interfering with predictive effi- 
ciency. Many more students are now given clinical guidance services 
so that they will have better educational and vocational orientation, 
select their courses in line with interests and aptitudes, and, if neces- 
sary, be referred for rehabilitation in how-to-study courses and through 
scholastic motivation by counselors. Moreover, counselors in the 
various colleges have become increasingly active and effective in 
advising students with regard to vocational choices, the selection of 
college courses and more effective study habits. 

A special course in study methods was instituted because many 
students of high aptitude did not achieve up to the level of that apti- 
tude. Bird has shown that this method does rehabilitate many 
students.' It also may displace the ranks of other students so that 


1 Bird, Charles: Effective Study Habits, New York: Century, 1931. 








The Decreasing Accuracy of Scholastic Prediction g 


they achieve better than their aptitude tests and high-school scholar- 
ship would predict. 


ANOTHER POSSIBLE CAUSE 


It is possible, however, that none of the above factors have caused 
these reduced correlations reported in Table I. The cause may lie in 
the criterion of college grades, not only in the unreliability of those 
grades but also in the failure of the faculty to adjust grading standards 
to changes in level of aptitude. The admissions policy of the Arts 
college changed in 1932 so that low-aptitude students were refused 
admission and sent to another unit of the University. The basis for 
this action was, for the most part, percentile rank in form AMCN of 
the college aptitude test and high-school scholarship. This adminis- 
trative change, in large part, accounts for the increase in the average 
aptitude test score in 1934 and 1935. This policy makes it possible 
for the college faculty to have a more homogeneous group with regard 
to high level ability, presumably the type of student able to absorb 
the material presented in classes. 

It is well, however, to raise the question, in view of this marked 
increase in level of ability of Arts college freshmen, whether the faculty 
now assigns higher grades, on the average, for these higher aptitude 
students. Does the faculty assign more passing grades in line with 
this marked increase in level of aptitude, or does the faculty continue 
to grade on the so-called normal curve? If student ability increases 
but the faculty continues to give the same percentage of failing grades, 
then the correlation between predictive tests and grades may be 
reduced. Those students who formerly passed with D’s or C’s may 
now fail, since many, if not most, of the low-aptitude students have 
been refused admission. The faculty may operate on the assumption 
that some one must flunk in the freshman class; otherwise academic 
standards are not being maintained. Such a concept may be a residual 
of the normal-curve type of thinking. Such maintaining of normal- 
curve grading in the face of increased aptitude actually would operate 
to increase grading standards, thereby making the getting of passing 
grades more difficult now than in former years and again upsetting 
scholastic predictions. 

Tables V and VI indicate the extent of adjustment of grading to 
increases in level of aptitude in terms of the number and per cent of 
students at three ranges of percentile ranks receiving average grades of 
C or higher in their first quarter of residence in the Arts college. The 











10 The Journal of Educational Psychology 


Tastz V.—Psr Cent or Srupents MAKING SaTisFacTtory First QUARTER 
IN RELATION TO THE 1926 Norms ON THE 1926 C.A.T. 


GRADES IN COLLEGE 


































































































Men 
1926 1928 1934 1935 
1926 
C.A.T. percentiles 
1* 2 3 1 2 3 1 2 3 1 2 3 
71-100 131 78 |59.54| 200 | 134 |67.00| 261 | 166 |63.61) 259 | 153 |59.07 
31- 70 171 62 |36.26) 193 84 143.52) 200 76 138.00) 151 59 |39.07 
, ' 1— 30 142 20 |14.08) 159 47 |29.56| 41 10 |24.39| 44 14 {31.82 
WNC ssn ha ea acu 444 | 160 |36.04) 552 | 265 |48.01) 502 | 252 |50.20) 454 | 226 |49.78 
1) Women 
: 
|: 1926 1928 1934 1935 
’ 1926 ‘ 
i C.A.T. percentiles 
is 1 2 3 1 2 3 1 2 3 1 2 3 
if 
‘ 71-100 126 93 173.81) 182 | 147 |80.77| 232 | 162 |69.83| 190 | 137 {72.11 
‘S 31- 70 137 85 |62.04|) 164 90 |54.88) 184 88 |47.83) 144 70 |48.61 
1- 30 85 22 |25.88) 95 21 {22.11} 33 14 |42.42) 39 9 {23.08 
MCE darsaien oes 348 | 200 157.47] 441 | 258 |58.50) 449 | 264 |58.80) 373 | 216 |57.91 
Men and Women 
1926 1928 1934 1935 
1926 
C.A.T. percentiles 
1 2 3 1 2 3 1 2 3 1 2 3 
71-100 257 | 171 |66.54) 382 | 281 (73.56) 493 | 328 |66.53) 449 | 290 |64.59 
31- 70 308 | 147 |47.73| 357 | 174 |48.74| 384 | 164 |42.71) 295 | 129 [43.73 
1- 30 227 42 |18.50) 254 68 |26.77| 74 24 |32.43) 8&3 23 127.71 
ea i as ike Wie a eect 792 | 360 |45.45| 993 | 523 |52.67| 951 | 516 |54.26) 827 | 442 153.45 












































*1 = Total number of students; 
: 2 = Number of students with average grade of C or higher; 
9 3 = Per cent of students with average grade of C or higher. 


percentages of students with satisfactory grades (an average of C or 
i" higher) increased from 45.45 in 1926 to 53.45 in 1935. But the 
i: increase occurred between 1926 to 1928, whereas the marked increase 
in ability occurred in 1934 and 1935 for one form of the test and in 
It is apparent, also, that this 





1935 for the other, and not in 1928. 














11 
61 


a1 











The Decreasing Accuracy of Scholastic Prediction 


11 


increase in percentage is found for men and not for women, the per- 
centages for the latter remaining relatively constant despite the fact 
that the increase in average C.A.T. was as large for women as for men. 
A possible explanation is found in the fact that the increase in high 
school percentile for women for these four years was from 72.9 in 1926 


Taste VI.—Psr Cent or Strupenrs Maxine Satisracrory First-Quarter 
Grapes In Co.LLEGE in RExaTIon To Hieu-Scnoo.t PERcENTILES 








































































































Men 
1926 1928 1934 1935 
H.8. percentiles 
1* 2 3 1 2 3 1 2 3 1 2 3 
71-100 152 99 |65.13/) 196 | 150 [76.53] 249 | 168 |67.47| 246 | 168 |68.29 
31-— 70 176 48 |27.27| 214 74 134.58) 220 76 |34.55| 170 53 |31.18 
1- 30 116 13 {11.21} 142 41 |28.87| 33 8 |24.24) 38 5 {13.16 
De chase wien cceeuu 444 | 160 |36.04) 552 | 265 |48.01| 502 | 252 |50.20) 454 | 226 |49.78 
Women 
1926 1928 1934 1935 
H.8. percentiles 
1 2 3 1 2 3 1 2 3 1 2 3 
71-100 225 | 160 |71.11) 280 | 209 |74.64| 330 | 228 |69.09] 249 | 173 |69.48 
31- 70 85 32 |37.65) 118 45 |38.14) 105 34 |32.38) 112 40 |35.71 
1— 30 38 8 |21.05| 43 4/19.30) 14 2 114.29} 12 3 {25.00 
, ear 348 | 200 |57.47| 441 | 258 |58.50| 449 | 264 [58.80] 373 | 216 |57.91 
Men and Women 
1926 1928 1934 1935 
H.8. percentiles 
1 2 3 1 2 3 1 2 3 1 2 3 
71-100 377 | 259 |68.70) 476 | 359 [75.42] 579 | 396 168.39] 495 | 341 |68.89 
31-— 70 261 80 |30.65) 332 | 119 |35.84] 325 | 110 |33.85) 282 93 (32.98 
1— 30 154 21 113.64) 185 45 (24.32) 47 10 |42.55| 50 8 |16.00 
Matt isetaddheee 792 | 360 /45.45|) 993 | 523 [52.67] 951 | 516 |54.26] 827 | 442 [53.45 









































*1 = Total number of students; 
2 = Number of students with average grade of C or higher; 
3 = Per cent of students with average grade of C or higher. 








12 The Journal of Educational Psychology 


to 77.4 in 1935 whereas the corresponding increase for men was from 
54.2 to 69.5. If high-school grades are an index of ability and skill 
and willingness to use that ability, then the increase in per cent of 
men with satisfactory college grades is caused by the better quality 
of freshmen men. There has been no corresponding increase in quality 
of women students as indicated by high-school scholarship. It should 
be noted, however, that the per cent of women with satisfactory grades 
is still higher than that for men, even though the latter show a larger 
inerease. It is quite apparent that increased quality of college scholar- 
ship tends to parallel, in so far as there is any relationship, for the 
groups of freshmen, increases in high-school scholarship rather than 
increases in scores on the college aptitude test. 

The percentage of satisfactory students classified by levels of high- 
school percentiles and C.A.T. remains relatively constant for the upper 
level except for C.A.T. for women and high-school grades for men. 
Such fluctuations are difficult to understand. The only consistent 
trend is found in the decreasing percentage of women with C.A.T. 
percentiles between 31 and 70. Similar fluctuations are found for the 
lower level, the numbers being quite small in 1934 and 1935 due to the 
college admissions policy. Moreover, the average 1935 freshman is 
superior to the average 1924 freshman who graduated from a combined 
arts and professional course such as law.' 

The increase in percentage of men with C or higher average grades, 
from 45.45 to 53.45, is the only clear indication of adjustments in 
grading standards to conform to increased student aptitude as meas- 
ured by high-school grades and by the college aptitude test. Whether 
this increase is proportionate to, or all that is justified by increases in 
level of aptitude is a matter of opinion. But the average college fresh- 
man in 1935 has approximately the same percentile on C.A.T. as the 
average college freshman enrolling in this college in 1924 and grad- 
uating in 1928. That is, the average freshman in 1935 is equal to the 
average freshman in 1924 who successfully completed four years of 
college work. This seems to indicate that most instructors are con- 
tinuing to grade on an inflexible curve of distribution without regard 
to changes in student ability. Certain departments of the college, 
however, report a serious attempt to adjust to variations in student 
ability and achievement even though an abnormally large number of 
students may receive passing grades. But it is apparent that stand- 





1 Williamson, E. G.: ‘“‘Guidance Use of Senior College Norms,” Occupations, 
Vol. XV, Oct., 1936, pp. 26-30. 











The Decreasing Accuracy of Scholastic Prediction 13 


ards cannot be maintained unless there is improvement in the examina- 
tions permitting a comparison of achievement from year to year. 
Liberal and widespread use of standardized and comparable examina- 
tions in more classes would seem indicated if any effective adjustment 
of grades to ability is to result. One envies the student in one institu- 
tion reported to make such an excellent selection of its freshmen that it 
needs to fail only a small percentage of these students. But of what 
value is it to select more capable students if there results only an 
insignificant reduction in the number of failures? And by what 
predictive methods could one select if the criterion is based upon 
achievement curves apparently with but little regard to level of per- 
formance and ability? 

It may be that this apparent failure to readjust grading standards 
to increased student aptitude is the result of a genuine desire and 
conscious effort on the part of the faculty to raise the standards for 
high-aptitude students, now that the so-called drag of low-aptitude 
students is removed from the classroom. It is, of course, within the 
rights and prerogatives of any college faculty to maintain standards 
which seem justified to it. Ifa college sets as its objective the training 
of only high-aptitude students, then the maintenance of high standards 
is justified, but one may still expect that these standards be consistent 
and commensurate with student ability and that the percentage of 
successful students be increased in proportion to increases in aptitude. 


Otherwise injustices may be done to students and any hope for effective 
personnel work be diminished. 


AN INDEX OF PERSONNEL EFFICIENCY? 


It is possible that the lowered validity correlation is an indication 
of the effectiveness of the personnel program in bringing about better 
adjustments between academic pursuits and student aptitudes and 
interests. Adjustments in selection, placement, registration, and 
grading are justified from the personnel point of view even though 
lower validity correlations are produced by such procedures. Perhaps 
personnel workers should expect further lower correlations, not higher 
ones, as indications of efficiency and effectiveness. Personnel practice 
may be a counteracting force which operates to reduce the validity 
correlation of predictive tests. The inability of the standard regression 
equation to make allowances for important individual conditions, for 
example excessive outside work, which may not have importance for 
the average student but may have significance for certain individuals, 


, iF 


+ 


a 
y 


: 

a +o 
: 

ead ©. 























14 The Journal of Educational Psychology 


may disturb unduly the validity correlation. The test-maker seeks 
to devise instruments which will predict probabilities of success for 
the average student with given relevant characteristics, but the 
personnel counselor must use these instruments to predict achievement 
to be measured by an unstable criterion, under heterogeneous con- 
ditions and for students differing from this hypothetical average 
student. Wagner and Strabel! have demonstrated the effect upon 
validity coefficients of homogeneous grouping of students with regard 
to sex, size of high school, foreign-born parentage and other variables. 
But we know little about the effect of other factors which are uncon- 
trolled by the standard regression equation except by a procedure of 
averaging their influence for the hypothetical average student. The 
actual working conditions under which individual students perform 
are largely a mystery even to the counselor, and their effects, even if 
known, are scarcely measurable.? The increasingly widespread devel- 
opment of personnel procedures, which are factors in the individual- 
ization of education, described above may have introduced factors 
which disturb and depress the predictive efficiency of the tests used at 
Minnesota. 


POSSIBLE CORRECTIONS FOR LOW VALIDITY CORRELATIONS 


Frequent suggestions have been made that the correlation between 
predictive indices and college grades would be higher were it not for 
the low reliability of college grades. It may well be that widespread 
adoption of Tyler’s methods of improving final examinations** will 
produce higher correlations between predictive indices and college 
grades. But the present meagre evidence available does not indicate 
that higher validity coefficients will result. Reitz® reports that the 
correlation between the American Council Psychological Examination 





1 Wagner, M. E. and Strabel, E.: “Homogeneous Grouping as a Technique for 
Improving Prediction Coefficients,” School and Society, Vol. XL, 1934, pp. 887-888. 

2 This viewpoint is discussed at greater length in Student Personnel Work: 
“‘ An Outline of Clinical Procedures,” by E. G. Williamson and J.G. Darley. New 
York: McGraw-Hill, 1937. 

*Tyler, Ralph W.: “Evaluation: A Challenge to Progressive Education,” 
Progressive Education, Vol. XII, 1935, pp. 552-556. 

‘Tyler, Ralph W.: Constructing Achievement Tests, Columbus: Ohio State 
University Bureau of Educational Research, 1935. 

5 Reitz, Wilhelm: “‘ Forecasting Marks of New Plan Students at the University 
of Chicago,’’ School Review, Vol. XLITI, 1935, pp. 34-48. 


— _ fas 22 A = —_, as in ete - > 











The Decreasing Accuracy of Scholastic Prediction 15 


and average college grades has increased slightly and the corresponding 
coefficient for high-school rank has decreased slightly more as compared 
with results of previous studies. Since college scholarship has been 
measured at Chicago entirely by comprehensive examinations, the 
order of predictive efficiency of high-school rank and Psychological 
Examination has been reversed. Reitz refers to the use of new objec- 
tive and comprehensive examinations in college as the possible or 
probable cause of these slight changes and reversals in relationship 
only indirectly inasmuch as his reason for the reinvestigation of 
scholastic prediction was the adoption of the new plan of measuring 
scholastic achievement. 

It is quite likely, however, that objective final examinations will 
prove useful in increasing predictive efficiency provided certain addi- 
tional corrective procedures are adopted. Merely using a new objec- 
tive examination each quarter without regard to comparability with 
previous examinations does not correct the non-comparability of 
subjective marks based on essay examinations. We are still attempt- 
ing to predict a new type of grade each quarter. Moreover, trans- 
lating point scores on an objective examination into letter grades does 
not correct for the coarseness of grouping of grades based on essay 
examinations. Particularly is this true when teachers vitiate still 
further these letter grades by raising or lowering them in terms of 
weekly essay quizzes or term papers subjectively graded. It may 
well be that the advantage of higher reliability afforded by the new 
type criterion of scholarship is more than offset by the lack of com- 
parability, coarseness of grouping into letter grades, and vitiating of 
of scores by mixing in subjective grades. 

Still another factor may prevent increased predictive correlations 
with new type final examinations as opposed to the old type of teacher’s 
estimates. As mentioned earlier in this paper, standard regression 
equations are derived for the average student and are in error for many 
individual students whose conditions determining scholarship are not 
identical with those of the hypothetical average student. Such errors 
are not corrected by use of new type examinations. In view of these 
three factors not corrected by present use of new type examinations, 
one may remain cautious in accepting the arguments (evidence is not 
yet available) for objective examinations based solely on hopes for 
higher validity coefficients. Tyler and other workers have advanced 


other, and possibly more realizable, arguments in favor of new type 
examinations. 























16 The Journal of Educational Psychology 


A possible correction for the errors inherent in the standard regres- 
sion coefficient may be found in attempts to predict scholarship for 
individual students by means of clinical diagnoses and counseling 
procedures. In such a predictive method, the standard regression 
equation for the average student may be supplemented or corrected 
for particular students by means of clinical data resulting from 
intensive and extensive analysis of each student with regard to the 
probable effect upon scholarship of factors which may not operate for 
the average student. Such factors as negativistic attitudes, inefficient 
study skills despite high aptitude, excessive outside work, worries 
about finances and excessive social activities may not influence the 
scholarship of the average student but they may be operative in the 
case of many individual students and thereby upset the correlation 
between ability and scholarship. Clinical diagnosis, supplemental to 
standard regression equations and followed by intensive counseling 
to correct these disturbers of prediction, would seem to be a promising 
field of personnel research in the prediction of college scholarship. 








A PROPOSED PROCEDURE FOR INCREASING THE 
EFFICIENCY OF OBJECTIVE TESTS 


JOHN C. FLANAGAN 
Codéperative Test Service of the American Council on Education 


INTRODUCTION 


In recent years the use of the Spearman-Brown prophecy formula 
has led to the lengthening of many tests in order to secure more reliable 
measures. It seems to have been frequently overlooked that this 
formula demands that the items to be added to a test must be similar 
in nature, including equal reliability coefficients, to those in the 
original short form. 

The importance of this point is shown by the fact that some pub- 
lished tests would be more ‘‘reliable”’ if half their items were not 
printed. For example, the reliability coefficient of a test composed of 
one hundred items having item-intercorrelations of 0.15, would be 
0.95. If another group of one hundred items having intercorrelations 
of .05 and measuring the same general function are added to this test, 
the reliability coefficient for the total test composed of two hundred 
items is 0.94.! 

In practice the distribution of item ‘“‘validities” in a preliminary 
form usually covers a considerable range. Therefore the question 
of how many of these items shall be included always arises. Besides 
the contribution made by the item to the reliability of the test there 
are many other factors to be considered. Frequently, it is necessary 


to have a wide range of difficulty present in the test so that it may be , 


used effectively at various levels. An item requiring a short time to 
answer is to be preferred to one requiring a long time to answer even 
though the latter is slightly more “‘valid.’”’ Also, the items in certain 
types of tests, notably achievement tests, must be scattered over the 
field and not concentrated on one aspect of the subject in question. 
Furthermore, various purposes require different degrees of accuracy 
of measurement and for many purposes a short test representing only 
a slight loss in “‘reliability”’ is greatly preferred to the somewhat more 
“reliable” longer form. 





1 This value is obtained using Formula 147 in T. L. Kelley, Statistical Method; 
and Formula 19 of the appendix of C. Spearman, Abilities of Man. The assump- 
tion is made that the standard deviations of the scores on the two groups of items 
are equal. 


17 


WN 
t 
il. 
i 
1 

t 

| 

















18 The Journal of Educational Psychology 


In addition to a decision as to the number of items to be included 
in a given form, the test-constructor must set a time-limit for the test. 
Since individuals vary widely in their speed of responding to items and 
because most standard tests are given under conditions which demand 
that all individuals must terminate their work by a given time, the 
factor of speed is usually of considerable importance in determining 
the test-score. Certain test-constructors, notably Toops, have felt 
so strongly that emphasis on the speed-factor is undesirable that they 
have discarded time-limits entirely for certain tests. The importance 
of speed may be diminished by reducing the number of items to be 
attempted during a given period of time. This, however, immediately 
decreases the efficiency of the procedure, since many are forced to 
remain idle a considerable portion of the time. 

Present test-scores have the additional disadvantage of being 
combinations of speed and accuracy in unknown proportions for a 
given individual. An individual who has formed the habit of checking 
his work to insure complete accuracy is frequently at a disadvantage. 
In the succeeding paragraphs a form of test is proposed which it is 
believed will eliminate many of the difficulties mentioned above. 


THE REPEATING-SCALE FORM OF TEST 


The essence of this technique is that instead of one long scale with 
items proceeding from easy to difficult, there shall be several short 
scales each graded in difficulty. For example, instead of one scale of 
one hundred items for a vocabulary test, there might be four scales 
of twenty-five items each. One of the chief merits of the plan would 
derive from the placing of the most “‘valid’”’ items in the first of the 
twenty-five-item scales, the next most ‘‘valid” items in the second 
scale, and soon. Of course, the scales should be selected so that each 
contains a similar distribution of item difficulties. 

The number of the first twenty-five items answered correctly would 
represent a “level” score for this scale. Similarly, “level” scores 
would be obtained for all scales which the student has finished in the 
time allowed. The total number of items answered correctly for all 
the scales including the last one attempted would be the individual’s 
““speed’”’ score. The ‘“‘level’”’ score would be the average of the 
“level”? scores made on the scales completed. The “speed” and 
‘level’ scores might be used separately or in the combination which 
had been found to yield the most valid index of the function being 
measured. 








_ wee SES=——CCC wa bal 


i ced 





Increasing the Efficiency of Objective Tests 19 


With respect to time-limits, the ‘‘level’’ scores would be practically 
unaffected by varying the time limits over a wide range. Thus 
“level” scores could be legitimately compared with norms even though 
circumstances necessitated a fairly large deviation, downward or 
upward, from the time-limits recommended. Furthermore, “speed’’ 
scores could be easily converted from those resulting from any partic- 
ular time-limit to those for the recommended time-limit by the simple 
expedient of multiplying the obtained ‘‘speed”’ score by the ratio of 
the two time-limits. This involves the assumption that all items 
require the same interval of time, but, since the easy and difficult 
items are arranged in repeating-scales, the assumption is much less 
violent than those implicit in the usual testing procedures. 

Since the first scales would be most valid and reliable, all indi- 
viduals would be quite accurately rated as to ‘‘level’”’ regardless of 
considerable variations in the time limits allowed. In fact, with a 
distribution of ‘“‘item-reliabilities” such as that mentioned in the 
second paragraph of this article, an increase in ‘‘reliability” and 
‘validity’ would be obtained by decreasing the time limits of the test 
in “‘repeating-scale” form. It should be noted that many published 
tests include a sufficient number of comparatively non-discriminating 
items to make possible a real improvement by such a procedure. 


A ‘‘REPEATING-SCALE’’ VOCABULARY TEST 


In an effort to determine the desirability of the ‘“repeating-scale”’ 
form for the Codperative English Vocabulary test, the items on one 
form of this test were divided into four scales on the basis of previously 
obtained data as to the “difficulty” and “validity” of the items. The 
division into scales was accomplished by assigning the ten or twenty 
easiest items to places in the various scales, then a group of the next 
easiest items, etc., until all items were assigned. The distribution of 
“‘item-difficulties”’ in each scale was made as closely similar as possible, 
with the most ‘‘ valid” items in the first scale, the next most ‘‘ valid” 
items in the second scale, etc. 





1 Although the use of this assumption appears to be warranted in dealing with 
many types of test items, the point is not crucial, since by merely having a group 
of individuals mark the item upon which they were working at various time- 
intervals, it would be quite easy to prepare tables of empirically determined 
equivalent scores for a number of different time limits. ‘‘Speed’’ scores could 
then be converted for comparison with the norms by the use of the tables, or, 
norms could be given for various time limits. 








“y ™ m 
Oe PES 


- Se ms an i ret 


oe eee 
—_—— -. 


Sa 


y 
of 
a; 

t if 
% 
wh 
th 

Aig 











20 The Journal of Educational Psychology 


The papers of a new group of two hundred individuals were then 
scored for each combination of items. The resulting Pearson-product- 
moment intercorrelations for the various scales together with the 
appropriate means and standard deviations are shown in Table I. 
The correlation of scale one with scale two is seen to be 0.842 whereas 
the correlation of scale three with scale four is but 0.754. The correla- 
tions with total score shown in the last column also show a difference 
between the scores of the first and last scales. These latter correlations 
of course contain a spurious element since each part is contained in 
the total. 


TaBLeE I.—Tue Means, STANDARD DEVIATIONS, AND PEARSON-PRODUCT-MOMENT 
INTERCORRELATIONS OF ScaLEs 1, 2, 3, 4, aND ToTat-ScorREs FOR THE 
CoépEeRaTIVE ENGLIsH VocaBuLARY TxstT, Series 1, Form 1933 








N = 200 
Scale 1 2 3 4 Total 
Standard deviation........ 4.81 5.07 4.26 4.48 17.17 
De cue ass bck ac hex 8.10 8.88 9.12 10.00 | 35.64 
¥ .842 .790 .815 .937 
2. .801 .794 .936 
3. .754 .902 
4. .908 




















The reliability coefficients of tests containing the first scale, first 
two scales, first three scales, and all four scales are given in Table II. 
It is apparent that only a very slight increase in the “reliability” 
of the test is obtained by using the last two scales. This indicates 
that the time limit of the test could be reduced if it were in the form of a 
“‘repeating-scale” test with little, if any, loss in information. The 
added information contained in the “‘level’’ and ‘‘speed”’ scores should 
more than compensate for the slight decrease in the reliability 
coefficient. 


Taste I].—Tue RewiaABiuiry CoEFFICIENTS OF VARIOUS COMBINATIONS OF 
ScaLEs AS PREDICTED FROM THEIR INTERCORRELATIONS 








Scales Number of items Reliability coefficients 
1 25 .842 
1 and 2 50 .915 
1, 2, and 3 75 .929 
1, 2, 3, and 4 100 .940 














8 
1 
e 
d 
y 





Increasing the Efficiency of Objective Tests 21 


Previous experience with this vocabulary test has shown that the 
scores of individuals averaging fifteen or twenty points higher on this 
test give somewhat larger reliability coefficients. This is to be 
expected since the item ‘‘validities’’ are slightly higher in general for 
the more ‘‘difficult”’ items of this particular test. Since the difference 
between the item ‘‘validities” in the first twenty-five-item scale and 
succeeding scales is quite constant throughout the range of difficulty, 
the use of a group whose average score was either higher or lower than 
that of the present group should have no effect on the general nature of 
the results. 


SUMMARY 


It is suggested that the efficiency, flexibility, and validity of many 
types of test might be improved by using a ‘‘repeating-scale”’ form of 
test. In this form items are divided into short scales having similar 
distributions of difficulty but so arranged that the ‘‘best”’ items are in 
the first scale, the ‘‘next best”’ items in the second scale, etc. An 
analysis of a vocabulary test indicates that the ‘‘repeating-scale”’ 
form of test would make possible a decrease in time limit with little, 
if any, loss in information. 

















IMMEDIATE QUALITY: A FACTOR IN THE 
APPLICATION OF PSYCHOLOGY 


B. 0. SMITH 
University of Florida 


The success of persons not possessing a knowledge of scientific 
psychology in controlling or adjusting to social situations is a common- 
place of everyday experience. It is not an unusual event for a person 
untutored in scientific psychology to be very effective in controlling 
the opinion of his community and in directing the conduct of his 
fellow men. Almost anyone can readily call to mind individuals who 
have made outstanding records as salesmanagers, salesmen, and 
promoters without the slightest knowledge of scientific psychology. 
Moreover the psychology displayed in literature is hardly comparable 
to that found in standard treatises on psychology. Some principles 
of scientific psychology may be implicit in the literary treatment of 
human nature but no one would argue that such principles need be 
explicitly present to the mind of the author. 

The familiar examples mentioned above, however, are in them- 
selves little evidence that a knowledge of scientific psychology would 
not play an effective part in such situations. For, after all, many 
such examples can be found in the history of any science or scientific 
technology. There were engineers who attained considerable success 
in the techniques of construction long before there was a science of 
engineering. Likewise, there were successful physicians before there 
was a science of medicine. And it is common knowledge that lan- 
guages were learned and spoken long before any structural analysis 
or grammar of them was evolved. 

Nevertheless, the modern physician is not one who can boast 
of his lack of knowledge of the science of medicine. Nor is the 
successful engineer of today one who lacks mastery of the sciences 
underlying the technology of engineering. A modern physician or 
engineer who, without a technical knowledge of his profession, equals 
or excels in efficiency the technically trained members of his pro- 
fession would be an occasion for wonder and amazement. For the 
technical knowledge accumulated in these professions has lifted the 
practitioner above the level of the empirical craftsman. 

When we attempt to stretch the argument to cover scientific 


psychology, however, our efforts are refuted by the ineffectiveness 
22 








i, Ta = 6g 


wo 


Immediate Quality 23 


of such psychology in dealing with the problems that arise out of the 
social process. The failure of a knowledge of scientific psychology to 
function in solving such problems is perhaps nowhere better shown 
than in the profession of teaching. Almost all teachers today have 
done considerable work in scientific psychology. But let anyone who 
has fortified himself with the principles of scientific psychology sub- 
sequently take up the task of teaching and he will readily recognize 
the inefficacy of his psychological knowledge to function in the situa- 
tions as they arise in the classroom. Many persons make failures as 
teachers apparently because they lack the very qualities which a 
study of scientific psychology is supposed to supply. Such persons 
are usually lacking in the qualities necessary to enable them to get 
along with others, to understand them, and to control and to direct 
their thinking, feelings, and actions. 

The functional value of scientific psychology is, therefore, brought 
into question by the failure of those persons who have command of 
its principles to utilize those principles in solving the problems that 
arise in the course of their everyday affairs. Indeed, the students of 
psychology appear to get along no better with their fellow men than 
do students of biology, chemistry, or other subjects. 

It should be apparent to the close observer that the actual amount 
of scientific psychology which contributes to the solution of the 
problems of human relationships is very small when compared to the 
vast body of material which the study of psychology has accumulated 
or to the large amount of empirical psychology actually used in 
solving these problems. The cause of this failure of a knowledge of 
scientific psychology to function in the actual process of social inter- 
course affords the problem of this discussion. 

This problem has been recognized, but it has not received the 
attention which it deserves. Ina recent article Skaggs' has recognized 
the problem and has attempted to account for the failure of scientific 
psychology to help one to adjust to or to control life situations on the 
ground that these situations are much more complex than the regulated 
and controlled conditions of experimentation. In other words, the 
formulations of scientific psychology cannot be generalized to cover 
the situations that arise in everyday affairs. 

Skaggs closes his article with the discouraging statement that he 
‘can see no other conclusion than that scientific psychology is and 





1 Skaggs, E. G.: ‘‘The Limitations of Scientific Psychology as an Applied or 
Practical Science.” Psychological Review, Vol. XLI, pp. 572f. 





24 The Journal of Educational Psychology 


must be of little practical value . . . The more scientific the psy- 
chologist becomes the more must he retire from the general and 
complicated problems to more restricted problems and work in iso- 
lation from the world at large. There is simply nothing that can be 
done about the matter.”’ If this conclusion be true, then the meta- 
physicians should prepare to receive the psychologists and to share 
with them their favorite pastime. 

There is indeed much to be said for this position which holds that 
the failure of scientific psychology to function is due to the fact that 
life situations are more complex than experimental conditions. But 
the present writer is not fully convinced that scientific psychology 
cannot be made more functional in life situations, nor does he accept 
the view implied by Skaggs that there is no systematic alternative to 
the approach of scientific psychology to the study of human nature. 

Klein,' who has also recognized the failure of scientific psychology 
to function, has offered a very stimulating account of a new type of 
psychological approach to the understanding of human nature. This 
approach, it is claimed, would yield an understanding of human 
nature that would function in our social technologies. His discussion 
is based on the work of the German school of “‘cultural science psy- 
chology ’”’—geisteswissenschaftliche Psychologie. According to Klein 
“‘the content of all the Geisteswissenschaften is immediate experience 
in so far as that experience is determined by the interaction between 
objects and discerning and manipulating subjects.” 

Klein goes on to state that the cultural sciences, of which psy- 
chology is one, differ from the natural sciences in two respects. In 
the first place, the cultural sciences do not depend upon such abstrac- 
tions as are used in the natural sciences; but rather they rest upon 
immediate experience. In the second place, the cultural sciences are 
characterized by their recognition of values while the natural sciences 
seek for ‘‘‘value-free’ abstractions designated as ‘natural’ law.’ 
Hence ‘‘cultural science psychology” is another approach to the 
study of human nature and that approach is through ‘experiential 
phenomena sui generis.’”” Klein however offers “cultural science 
psychology” not as a substitute for scientific psychology nor as a 
new school of psychology, but as an additional and profitable way of 
approaching the study of human nature. 





1 Klein, D. B.: ‘Scientific Understanding in Psychology.” Psychological 
Review, Vol. XX XIX, pp. 552-569. 





an re ae se 


oer Doeoeeei iso © 


> elie ap His ao EL o> EE <n me 


= fA ef 2B oe Se FE 


ot at Miao 





Immediate Quality 25 


The present writer finds much in common between this viewpoint 
and that of his own. Although in the present paper no attempt is 
made to discard scientific psychology, the writer feels confident that 
the viewpoint of this psychology neglects many aspects of human 
behavior. And a study of human behavior through immediate or 
surface phonemena sui generis seems to him to promise great reward. 

Nevertheless, the present writer observes that the formulations 
of natural science are applied to the ordinary world everyday and he 
assumes that perhaps those of scientific psychology could likewise be 
made more functional if there were some adequate criterion by which 
they could be applied. The practitioner cannot make use of scien- 
tific laws and facts from a knowledge of them alone, however neat 
and complete that knowledge may be, for, and in this statement lies 
the matrix of our problem, in no case should it be assumed that 
scientific laws and facts are to be used as rules of practice in any 
technology. As Dewey has pointed out, they have only an indirect 
and liberative value to the practitioner.' 

The present paper is therefore concerned with the search for some 
key by which the formulations of scientific psychology may be applied 
to the affairs of everyday life. Only by implication does it deal with 
a new approach to the understanding of human nature, vital as that 
may be. It is the thesis of this paper, then, that the failure of scientific 
psychology to function in the problems arising out of the social 
process is due largely to its disregard of the immediate qualities of 
human relationships and behavior. 

By immediate qualities we mean the qualities of directly observable 
behavior; for example, the tone of the voice, facial expressions, bodily 
movements, and the like. These directly observable qualities are 
the starting points of scientific psychology. For it is the problems 
which arise out of the matrix of these qualities that set the beginning 
tasks of psychology. And the area of natural phenomena marked 
off by these qualities fix the field for psychological exploration and 
investigation. 

But scientific psychology, after the fashion of the physical sciences, 
has not been content to grub around on the level of immediate qualities. 
Rather it has found it necessary to abandon such qualities and to dig 
beneath them to the underlying structures and processes in terms 
of which the immediate qualities themselves could be explained. 





1 Dewey, John. Sources of a Science of Education, pp. 28-30. 














ise P 


Sie 
— 


ne 2 4 wa ate e = 


eas) 
a 
a 
ie 
ob 
: U 





ae 








26 The Journal of Educational Psychology 


The reason for this surrender of immediate qualities is fairly 
obvious. After all immediate qualities offer no satisfactory basis 
for scientific formulations. For such qualities are unique and final, 
provoking a flux of conflicting and unorganizable experiences. Sci- 
ence, however, is concerned with the problem of establishing constant 
relationships among phenomena and in reducing the flux of the naive 
world to order and unity. In areas where immediate qualities are 
mostly unique and uncertain in tenure, it is therefore urgent for 
science to abandon such qualities and to seek for their explanation in 
simpler and more stable essentials. For this reason scientific psy- 
chology has forsaken the phenomenological approach in which the 
method is to determine relations between directly observable phenom- 
ena. In its place it has substituted the more indirect approach of 
atomic physics in which the complex observable phenomena are 
explained by reference to unobservable, simple events. 

This point of view is well illustrated in a quotation from Thorn- 
dike. ‘‘The hypothesis,” says Thorndike, ‘“‘which we present and 
shall defend admits the distinction in respect to surface behavior, but 
asserts that in their deeper nature the higher forms of intellectual 
operation are identical with mere association or connection forming, 
depending upon the same sort of physiological connections but requiring 
many more of them.””! 

Note that immediate qualities—surface behavior—are admitted 
but they have no scientific status, for they are explained by recourse 
to the events of a hidden mechanism much as color is eliminated in 
physical considerations by recourse to wave lengths. Here differences 
in qualities are reducible to differences in quantity and the ambition 
of science to reduce the flux of qualities to a common denominator is 
fulfilled. 

It makes no difference whether the common denominator which 
is finally accepted is physiological connections, physico-chemical 
entities, or neural energy, the pattern of explanation is the same. 
Immediate qualities are denied an explanatory réle. They are them- 
selves objects of explanation but not the data of explanation. Con- 
sequently in the logic of scientific psychology there is apparently no 
good reason why immediate qualities should be brought up for special 
consideration. But with the denial of an explanatory réle to imme- 
diate qualities has come also a neglect of these qualities as points of 





1 Thorndike, E. L.: Measurement of Intelligence, p. 415. 


Immediate Quality 27 


reference in the application of the scientific formulations of psychology. 
As a consequence scientific psychology has severed its only connecting 
link with the ordinary world, like the boy in the story who sawed the 
limb between himself and the tree. 

It is true that preoccupation with the srectaien and processes 
underlying immediate qualities has resulted in greater scientific under- 
standing and unity of explanation, but it has been at the expense of the 
functional value of such insight in our social technologies. Indeed, 
human nature as described by scientific psychology is not the same as 
the human nature which we deal with in the social situations which arise 
in our everyday social intercourse. The psychology which we use 
in dealing with our fellow men and in depicting to others the feelings, 
thoughts, and actions of other persons is not the psychology studied 
either by experimentation or statistical analysis. The qualities of 
naive human nature are very remote indeed from the explanations of 
them by scientific psychology. And the deeper the science of psy- 
chology is pushed into the structures and processes underlying naive 
human nature, the more divorced it becomes from the ordinary world 
of human affairs as we live in it and experience it. Scientific psy- 
chology overlooks the fact that in social situations we have to deal 
directly with immediate qualities of human nature in their uncertain 
and fleeting tenure, rather than with statistical tabulations and 
formulas or with experimental observations and formal principles, 
however valuable these may be in another context. 

Nevertheless, we should not forget the success of modern science in 
controlling and shaping events on the level of sentiency by recourse to 
an underlying and hidden mechanism. For example, wave lengths 
are very remote from colors but they afford the means of understanding 
and controlling events of which color is an immediate quality as well as 
events such as electromagnetic phenomena with which color has only 
a very remote connection, if any. It is evident that such an intellec- 
tual formulation of immediate qualities is not an end in itself. It is 
merely a means by which events characterized by immediate qualities 
may be more adequately controlled and directed. In other words, 
recourse to simple elements has purely a functional value in scientific 
explanation. 

It is only reasonable to expect that such intellectual formulations 
of the immediate qualities of human behavior would yield similar 
results. But when we attempt such formulations in respect to these 
qualities, the resulting control of situations is not in keeping with our 


























Baan erat eer ar 
sae , 





28 The Journal of Educational Psychology 


anticipations. This intellectual reduction of‘qualities to underlying 
structures and processes still leaves untouched the great body of 
problems which we encounter in our everyday relationships with others, 
problems which are tog mobile, kaleidoscopic, and fleeting to permit 
more than a passing analysis. 

What is lacking is some method by which these intellectual formula- 
tions may be made functional. In other words, scientific psychology 
is in danger of forgetting the instrumental character of its elementary 
units or underlying structures and processes. When this occurs the 
functional value of psychology is completely lost and its conclusions 
become empty statements. 

The method of rendering the intellectual formulations of psychology 
functional seems to lie in a recognition of the part which immediate 
qualities play in behavior. We deal with individuals and groups 
and what we say and do is conditioned more by the verbal and physical 
expressions (immediate qualities) of those about us and by what we 
ourselves have in mind than by what we know of scientific psychology, 
perhaps not because the psychology is incapable of functioning but 
because we have no key by which to apply its various formulations. 

Let us take for example the principle of readiness as formulated by 
Thorndike. A teacher may understand the principle thoroughly but 
still be unable to determine whether or not his pupils are in a state of 
readiness. For readiness is defined in terms of the physiological state 
of the organism, and a teacher is rarely a physiologist and even if he 
were he could hardly conduct a physiological analysis in his classroom. 
To teach one the principle of readiness and then expect him to apply 
it in actual teaching without a recognition of the immediate qualities 
which signify its presence or absence in his pupils is almost comparable 
to expecting an artist to paint from a knowledge of the numerical 
indices of colors. 

Sensitiveness to immediate qualities should prove the key which 
unlocks the functional value of the formulations of scientific psy- 
chology. This is implied in a statement from Dewey. Ina discussion 
of the objects of science he says, ‘‘ Whatever are designated as elements, 
whether logical, mathematical, physical or mental,.depend especially 
upon the existence of immediate, qualitatively integral objects. 
Search for elements starts with such empirical objects already pos- 
sessed. Sensory data, whether they are designated psychic or physical, 
are thus not starting points; they are the products of analysis. Denial 
of the primary reality of immediate empirical objects logically ter- 


eer me og 








8, 
it 


no<s< 7 


anos 


- DO mee 


c 


a eS Sh Ue 


Immediate Quality 29 


minates in an abrogation of the reality of elements; for sensory data, 
or sensa and sensibilia, are the residua of analysis of those primary 
things. Moreover every step of analysis depends upon continual 
reference to these empirical objects. Drop them from mental view 
for a moment and any clew in search for elements is lost. Unless 
macroscopic things are recognized, cells, electrons, logical elements 
become meaningless.’’! 

The foregoing statement is primarily concerned with immediate 
qualities as guides to analysis, but it is no less true that they are 
guides for the application of that which is discovered by analysis. If 
it is true that the loss of mental view of these qualities for a moment 
impairs the search for elements, it is equally true that failure to recog- 
nize them as they actually occur will preclude the application of the 
intellectual formulations of these qualities in terms of the elements. 

Meyer recognized the value of immediate qualities in understanding 
human behavior when he said that ‘‘There is always a place for ele- 
ments, but there is certainly a place for the large momentous facts of 
human life just as we find it. . . . The psychopathologist had to learn 
to do more than the so-called ‘‘elementalist,’”’ who always goes back to 
the elements and smallest units and then is apt to shirk the responsi- 
bility of making an attempt to solve the concrete problems of greater 
complexity. The psychiatrist has to study individuals and groups as 
wholes, as complex units, as the ‘you,’ or ‘he’ or ‘she’ or ‘they’ we 
have to work with.’’? 

If in the light of the discussion we review cases in which people 
untutored in scientific psychology have been successful in dealing with 
situations involving psychology or the cases in which those who have 
command of the principles of psychology have failed in such situations, 
we will find that in each case sensitiveness to immediate qualities 
played a significant part. The former are what are sometimes referred 
to as born teachers, or salesmen and so on. They possess an unusual 
facility in the recognition and use of qualities which render very 
effective what psychological knowledge they have empirically picked 
up. The latter understand the principles and procedures of psy- 
chology, but they lack the ability to connect the immediate qualities 
of social situations with the appropriate principles and procedures. 





1 John Dewey: Experience and Nature, p. 144. 
2 Meyer, Adolph: A Psychiatric Milestone, p. 32, 38. Quoted by Dewey; 
Experience and Nature, p. 145. 


























30 The Journal of Educational Psychology 


They are doubtless not sensitive to all the pertinent qualities, and even 
those which are recognized are not all seen in their diagnostic capacity. 

There appears to be no reason to suppose that the ability to recog- 
nize immediate qualities and to understand their implications cannot 
be taught and learned. Perhaps when we give as much attention to 
this phase of psychology as we give to the study of underlying struc- 
tures and processes, we shall develop reliable criteria for a psychological 
technology and for the functional use of the formulations of psychology 
in a variety of fields. But if the study of psychology continues to 
follow the practice of science in its neglect of qualities, it is conceivable 
that it may build up a body of principles and information, but still 
make no provision for its effective application. 





RELIABILITY OF TELEBINOCULAR TESTS OF 
BEGINNING PUPILS’! 


ARTHUR I. GATES 


Teachers College, Columbia University 
AND 
GUY: L. BOND 


Fredonia State Normal College 


In a study of the relationships between a large number of charac- 
teristics of the children determined at the time of entering the first 
grade, again at the middle of the year, and finally at the end of the 
year, and reading ability measured in the middle and end of the year, 
it was found that the results of tests of the Betts Ophthalmic Tele- 
binocular obtained at these three intervals differed in a number of 
instances. In order to tell whether these differences were due to 
actual changes in vision or to variability in performance, it was neces- 
sary to repeat the tests at a short interval. This was done for twenty- 
six pupils, who were first tested about three weeks after the opening 
of school and a second time within a week. The pupils were all 
drawn from a class of “‘dull-normal”’ children (IQ’s between seventy- 
five and ninety) in the Speyer Experimental School jointly operated 
by Teachers College and the Board of Education of the City of New 
York. The tests were given by the same examiner, who had had 
about two years experience and had used the instrument with a large 
number of first-grade as well as older children. It is believed that the 
tests were conducted as efficiently as they are likely to be under 
typical school examining conditions. The materials and methods 
described in Betts’ manual, dated June, 1934, were employed. 

The data which are given in detail in Table I are reported in the 
form of scoring recommended inthe manual. In some cases, actual test 
figures are given and in others such symbols as N (normal), D (doubt- 
ful), F (failed). In rare instances the results are regarded as ‘‘inde- 
terminate,’’ which means that, because of the pupil’s inability to 
understand the directions, or to follow the procedures prescribed, no 
satisfactory measurement could be secured. The most practical 
question raised in this study is whether a pupil is likely to be rated as 
normal in one test and in another as doubtful or failure, or, in other 





1 This study is one of a series made possible by a WPA grant of funds and 
assistants. 
31 




















32 The Journal of Educational Psychology 


words, as having normal vision in one test and defective vision or eye 
muscle control in another given within the same week. 

It should be noted that the study reveals only whether a pupil is 
rated qualitatively or quantitatively the same on two different tests. 
It does not reveal the causes of the differences when they appear. It 
should be noted that different diagnoses could be produced by various 
causes, such as the following: Defects in the instrument, crudeness 
of the score, or records, or observations obtained, different interpreta- 
tions of directions by the pupil at different times, different degrees of 
attention or different techniques of taking the test at different times, 
uncontrollable variations in efficiency or in visual functionings from 
time to time, and various special factors which may make visual 
efficiency greater at one time than at another. It might be expected 
that the variability of performance, whether due to physical condi- 
tions, level of attention, influence of distractions, different interpreta- 
tions of directions, and the like, would be greater among these young 
children in the dull-normal range just entering school than among 
older, more experienced and brighter pupils. It should be noted in 
particular that the variations may be due to actual differences in vision 
at different times and not to any unreliability of the test or any fault 
in the instrument or directions or scoring procedure. It is, of course, 
conceivable that differences in fatigue or eye irritation, digestion, or 
other factors may result in clearer vision or better eye motor control on 
one day than on another or at one hour of the day than at another hour 
of the same day. The inquiry, in other words, is set up to determine 
whether it is advisable in the case of such young children to give more 
than one test, or whether one test is an essentially infallible index 
of the function it is designed to measure. 

The test for binocular vision, data for which are not presented 
here, show all twenty-six of the pupils to be normal on both tests. 

The test of far-point-fusion scored according to the manual as 
normal, doubtful, or failure, gives normal vision in both tests in the case 
of twenty-two children, normal in one and doubtful on the other in the 
case of two, and failure on one and doubtful in the other in the case 
of two. In approximately fifteen per cent of the cases, in other words, 
the results of the two tests do not agree. For absolute accuracy, it 
would be necessary to give this test at least twice to pupils of the type 
on which this study is based. 

The test of near-point-fusion shows the two examinations agreeing 
that the pupil had normal vision in twenty-four cases and that the 





—™> @D me ha 


— fA 6Ff- 





SS 


Det we 





Telebinocular Tests for Beginning Pupils 33 


pupil had defective fusion in one case, with a single case showing 
normal in one case and failure in the other. In this case the failure 
score is secured first, followed by a normal one. In the far-point- 
fusion test, in two of the cases of discrepancy, the normal rating is 




















































TABLE I! 
Fusion ae Ameteepla Visual aculty 
Verti- 
cal | Stere- 

a Imbal- | opsis, 3.00 0.50 0.00 
7: point, | point, | Far» | Near, on soore Both | Left | Right 

N.F.D.| N.F.D.| ®°°F® | sore — Te 

Left |Right| Left |Right| Left |Right 

I/M/ IG) iio) iiolijg .48'1.8 VB. e I ll Il 
AININ|NIN(|7.5/8.514 |5.5)N|N 00| 00] 00| 00| 00] 00 90 70} 80 
BIN|NIN|N17.518 |4.555 |WIN 00|-00/] 00| 00] 00] 32 80} 80| 80 
C | D|N|N|N16.5/6.5/4.514.5| N| N 00| 00/ 20|] 00] 40/ 00 110} 80}11 100 
D|ININ|NIN(|8 |7.5)5 |4.5)N|N 00; 00/ 00] 00] on] 22 90 
E\|NIN|NINI9 |7 |5 |4.5)NIN 00 | 00! 00| 00] 00] 00} 90/1 110} 80)110 
FININIF|N(7 |7 |3 4 |WIN 00 | 00! 00} 00] 00] 00/100) 90)100| 90\100) 90 
G |NIN|NIN17.516 [4.53 |NIN 00 | 00 | 00] 00] 00 00 110) 90/110 90/100) 90 
HININININ|8 17.515 |4.5)N|N 00} 00} 00] 00) 00| 00 90| $0 90 
I |NININ|N|8 |7.5/4.514.5)N|N 00} 00 | 00] 00] 13] ©0| 80) 90) 80] 90 80 
J |N|N|N|N(7.5|7.5/4.5/4.5| N| N 00; 00/ 00] 00] 00! 00 |110/100 110} 90 
KIN|NININ(|8 {8 [5 l4.5) FP |N 00 | 00/} 00] 00| 00] 00} 80! 80 80 
LININININ(|8 |8 [4 l4.5)NIN 00 | 00/ 00| 00| 00/| 00} 90) 90 80 
MIN|NIN(N|7.5)8 (4.515 |NIN eo | ee] ee] oe] se] 20 | go] 90 90 
N |NIN|NIN(|8.5/8.515 |5.5|N|N 00} 00} 00] 06} 00| 00 90/100) 90) 90| 90/100 
O|NINININ(8 |8 [5 l4.5,NIN 00} 00| 00] 00] 40/| 00 90! 70] 70} 80] 80 
P| FINI N|N|5.516.52.544 |WIN 00 | 00 | 00} 00] 00 24 |100/100\100|100! 110! 100 
QINIDININ|1 |7.55 4 |WIN 00 | 00 | 00] 00] 00 | 00 |100! 90| 90) 90! 90) 90 
RI|NINININI6 |7 4 14 |NIN 00} 00| 00] 00] 00 | 00 | 90) 90| 90) 90] 90) 90 
S|N|FIF|FI8 |8.55 |5.5).N|N 00 | 00 | 00} 00] 00 {| 00| 80) 80| 80) 90] 80) 90 
T|NIN|NINI7.518 |4.514.5|N|N 00| 00| 00] 00] 23) 02| 80) 80 80} 70} 80 
UININININ(8 (7 |5 |5.5|NIN 00 | 00| 34] 01] 34] 22] 90! 90) 90] 90] 90) 90 
VIN|INININ|S (7 |4.5/4.5)N|N 00 | 00} 00/ 00| 32] 66| 70) 70] 70| 70! 701 70 
WININI|NIN|S (7.516 |4.5|N|N 66 | 66| 66| 66| 66 | 66 60] 90] 80| 80] 60| 80 
XININI|NINI7 |8 |4 |4.5)NIN | ## | o#] e¢] e¢| ©! 70) 70] 70! 70| 60] 70 
Y|NIN|NIN{8 [8.515 [5 ININ 00 | 00 {| 21] 00] 44]! 33] 90] 90) 80) 80| 90] 90 
ZININININ(G [6.54 4 |NIW 03 | 01] 03] 03] 34] 341 80] 80) 80] 80| 80) 80 






























































1 The records for each child comprise a horizontal line. Asterisk* means unsatisfactory test response. 


secured in the first test and in the other two in the second. On the 
whole, therefore, it appears that the child is no more likely to get a 
normal score on the second than on the first examination. 

In the test of lateral imbalance for far vision, actual scores are given 
in Table I. Of the twenty-six pupils, twenty-four show both tests 


























34 The Journal of Educational Psychology 


falling within the range of normal vision and two falling within the 
normal range in one test and in the range of abnormal in the second. 
There are, of course, slight differences in the actual score of those whose 
records fall within the range of normal vision but these variations are 
diagnostically unimportant. All of the pupils except two differ by 
one point or less. One pupil shows a difference of two points and one 
a difference of six and a half. In the test for lateral imbalance under 
conditions of near vision the results are substantially the same, twenty- 
four were recorded as normal in both tests and two as normal in one 
and abnormal in the other. It would appear advisable to give two 
or more examinations of the young pupil to secure reliable data con- 
cerning lateral muscular imbalance. 

The test for vertical imbalance is recorded merely as N or F. 
Twenty-five cases show N in both tests and one shows F in the first 
test and N in the second. 

The test for stereopsis was recorded in detail by noting the pupil’s 
response to each line on the test slide. In the table is entered only 
the number of the highest numbered line correctly interpreted. Young 
children tend to fail on some of the easier lines. Since the directions 
call for considering only the highest number, interpretations are made 
on this basis. One pupil is recorded as indeterminate on this test 
since it was impossible to get him to understand the directions. 
Eighteen children got a perfect or normal score in both tests; three 
children differed by at least one step, one by three, one by five, one 
by seven, and one by eight steps. This test, in other words, gives 
different results in a larger number of cases than any of the tests thus 
far considered. It is obvious that it is a test that should be given 
with care and repeated two or three times to give reliable results. 

In the test of ametropia two pupils could not be satisfactorily 
measured. Eliminating them, the total population is twenty-four. 
The test of ametropia at near vision distance shows identical scores in 
all cases for the left eye and an appreciable difference in one case for 
the right eye. Tests of ametropia in far vision disagree in three cases 
for the left eye and two cases for the right. Tests for ametropia at 
infinity show disagreement for the left eye in ten cases and for the 
right eye in eight cases. This third test gives more variable results 
than the other two. 

The tests of the visual acuity yield scores varying from zero to 
one hundred ten per cent A.M.A. The highest score represents a 
Snellen feet score of 20/10, or a Snellen meter of 6/3. According to 


— . 3 A whet lUlUrF)lhlCr RK lC(<i—C Mr lCUee UlUlClCe CC 


eo Ds fF 








~~ Ow we Ww 


Telebinocular Tests for Beginning Pupils 35 


the Betts’ manual, a score of ninety per cent, which corresponds to 
Snellen feet 20/30 or better, may be considered normal. Table I gives 
the entries in terms of percentages for both eyes, the left eye, and the 
right eye. It should be noticed, first, that in the test for both eyes, 
thirteen of the pupils, or fifty per cent, have a score of ninety or more 
per cent in each of the two trials. Half of the pupils, in other words, 
are shown by both tests to have normal vision. Four of the pupils 
score a rating of normal vision on one of the tests and a lower vision 
on the other. The others fall below ninety in both tests. For four 
pupils, in other words, the results are uncertain. This is a sufficiently 
large proportion of the group to justify repeating the test at least for 
all pupils who receive a rating of ninety or lower on the first test. 
There is no case of a pupil who rated as high as one hundred on the 
first test falling below the ninety on the second. 

In the test for the left eye, eight pupils of the twenty-six rate 
ninety or above in both tests, nine cases score normal in one and sub- 
normal in the other, leaving nine cases in which both tests showed 
subnormal vision. In this table a number of large discrepancies 
appear. The tests for the right eye show eleven cases rating normal 
in both tests, four cases normal in one and sub-normal in the other, 
the remainder rating below ninety in both tests. It thus appears 
that visual acuity is more reliably measured in the test for both eyes 
than for either eye alone and that to secure substantially the same 
figure in the latter, several tests would be required. 

Shifting from a consideration of the agreement between the two 
tests to a study of the pattern of visual difficulties revealed by the 
individual children, it is found that only five children have an entirely 
clean slate in all the twenty-six tests, that is, both trials of all thirteen 
tests. Of the remaining twenty-one, the majority receive doubtful 
scores, that is, a score falling slightly below normal in one or more of 
the tests. Of these, a majority fall slightly below the normal in only 
one of the tests of the function. Certain pupils, like pupils C, K, 8S, 
and W, are in the doubtful group in at least one of the tests of each of 
several functions. In other words, on the basis of these examinations, 
approximately eighty-one per cent of the pupils show test results 
which suggest the advisability of a more careful visual examination. 
In comparison with data obtained on some of the older children, 
this is a surprisingly large proportion. Indeed, it is so large as to 
make it advisable to have a thorough-going examination of vision of 
all children on entering school. 


~e 
0 tlie 


ee ee te A tye oJ 




















36 The Journal of Educational Psychology 


There remains, however, the question whether the large proportion 
of low scores or doubtful performances on the test is due entirely to 
visual factors alone. The experience of various examiners who have 
tested children in entering school in our PWA projects has been that 
a large proportion of the children are difficult to test, either because 
they do not understand exactly what to do, or because of various 
faults in taking the test or reporting results. The same opinion has 
prevailed concerning other tests, including many of the types com- 
monly recommended for determining reading readiness. As will be 
shown in a later article, in some of the reading readiness and other short 
diagnostic tests, such as those contained in the Gates and other series, 
many of these children obtained much higher scores after a few weeks 
or months of experience in school. It should be pointed out that 
variable results due to lack of experience in taking the test at the time 
of entering school is no reflection upon the value or reliability of the 
test for use for most purposes at a somewhat later time. Indeed, the 
data in general, which are presented in this article only in part, suggest 
the advisability of giving entering pupils some degree of experience in 
taking various tests before examinations, the results of which will be 
actually used for practical purposes, are given. Some of the young 
pupils need to learn, for example, how to apply their attention to the 
test rather than to the examiner, how to maintain their attention for 
at least some time, how to render a report relevant to the assignment, 
and how to confine themselves to the activities designated by the 
directions instead of doing other miscellaneous irrelevant things. In 
other words, it is possible that, before reliable results can be secured 
from any kind of test, the first-grade child must be given a certain 
amount of experience in following test directions and working con- 
sistently in a test situation. 





—” A” 2. Cee. ee ee a“ ee 








=Y - 


Sa eS a a ee eee 


7 ep oapor 4 





TEN EXPERIMENTS ON WHOLE AND PART 
LEARNING 


MILTON B. JENSEN AND AGNES LEMAIRE 
Louisville, Kentucky 
INTRODUCTION 


A considerable amount of literature on the whole-part method of 
learning has become available since the beginning of the Twentieth 
Century. Definite conclusions cannot be drawn from many of the 
investigations because the experiments were poorly conducted or 
because of too few subjects, etc. On the other hand, many of the 
investigations were conducted most excellently and some valid conclu- 
sions have been established. At the present time, however, it cannot 
be stated positively that either the whole method or any form of the 
part method is superior for either learning or retention purposes. 
More carefully planned experimentation is necessary before statisti- 
cally valid results can be determined about these two methods of 
learning. 

In 1931 Grace O. McGeoch published a critical analysis of the whole- 
part problem.'! This was a comprehensive survey of the field from 1900 
to 1930, some thirty-odd studies being reviewed. In the course of 
her paper she suggested that the relative efficiencies of the whole and 
part methods can be determined by the following seven factors: 
(1) Forms of method used, (2) methods of measuring efficiency of 
learning, (3) subjects, (4) material used, (5) amount of practice, 
(6) method of measuring retention efficiency, and (7) the length of the 
interval after which retention is tested. She concluded her article by 
stating that the average data do not justify any generalization about 
the reciprocal and differential effects of these factors, much less recom- 
mend the use of any particular method of learning for classroom use. 

During the past few years Dr. McGeoch has been a careful investi- 
gator of the whole and part methods of learning, not only reviewing the 
literature in the field but also conducting experiments of her own relat- 
ing to various phases of the problem. The following short summaries 
of her findings and conclusions are given to show that the whole-part 
problem is by no means settled and that it is decidedly a problem for 
further research: 





1 McGeoch, G. O.: ‘‘Whole-part problem.” Psychol. Bul., Vol. XXVIII, 1931, 
pp. 713-739. 


37 


¥ 
bh 
; 
ny 


= sa = = ~ 
———————— OO OT ee eee 











a) a a tee oe 


oe 





38 The Journal of Educational Psychology 


1. The Intelligence Quotient as a Factor in the Whole-part Problem..—No 
reliable differences were found between the whole, progressive part, and pure 
part methods with nine and ten-year-old children in the learning or retention 
(after a twenty-four-hour interval) of twelve lines of rather abstract and 
uninteresting poetry. With the same group of subjects, however, a reliable 
superiority was found for the whole method over the pure part method in the 
learning and retention of vocabulary material, lists of ten Turkish-English 
pairs and lists of ten nonsense syllable English pairs. In learning, the advan- 
tage of the whole method was much greater with the gifted children (Mean IQ, 
151) than with the normal group (Mean IQ, 99). There were, however, no 
consistent differences between the whole and progressive part or between 
the progressive part and pure part methods. 

2. A Revaluation of the Whole-part Problem in Learning.~—The discussion 
in this article was limited to the field of memorizing meaningful material. 
Quotations were cited from seven leading psychology textbooks, five of them 
stating positively that the whole method is superior to the part method, both 
in learning and retention. The author summarizes five studies which she 
considered amenable to statistical treatment, because of the number of sub- 
jects used and the precision of the experimental work. She emphasized the 
need for re-evaluation of the experimental data and at the same time called 
into question the so-called textbook superiority of the whole method. 

3. Whole-part Problem in Memorizing Poetry.2—There were two hundred 
thirty-eight children who completed this experiment but only the records of 
one hundred seventy-two were used, the remainder being rejected because 
they used the wrong method. They selected their own methods of learning, 
but only two kinds were considered: Line-by-line and verse-by-verse learning. 
The data from this experiment show that all the differences between the two 
methods, with the exception of those in the percentages of retention, are 
in favor of line-by-line learning. The differences in the percentages of reten- 
tion were likewise not valid. On the basis of these results it can be concluded 
that there are no significant differences between two spontaneously and habit- 
ually employed methods of memorizing, two progressive part methods being 
used, the one involving units of a single line and the other units of four lines. 

4. The Condition of Reminiscence.~—Four hundred forty children com- 
pleted the experiment although the records of only three hundred ten were 
used, the poem being familiar to those whose records were not used. There 


1 McGeoch, G. O.: ‘The IQ as a Factor in the Whole-part Problem.” J. Ezp. 
Psychol., Vol. XIV, 1931, pp. 333-358. 

2 : ‘A Revaluation of the Whole-part Problem in Learning.” J. Educ. 
Res., 1932, Vol. X XVI, pp. 1-5. 

3 : ‘*Whole-part Problem in Memorizing Poetry.” Ped. Sem., Vol. ¢ 
XLII, 1933, pp. 439-447. 

‘ McGeoch, G. O.: “The Conditions of Reminiscence.” Am. J. Psychol.,“Vol. 
XLVII, 1935, pp. 65-89. 

















> 


7" Paeot met eos 6 @ 


> @ 0 





Ten Experiments on Whole and Part Learning 39 


were no reliable differences between the whole method and the part method 
in either learning or retention. It may be concluded from these results that 
with poetry, which is appealing to children, there are no significant differences 
between the whole and the part methods of learning. Mary Howitt’s, “The 
Spider and the Fly” was the poem used in this experiment, while in experi- 
ment No. 1 above, Bryant’s, “‘An Autumn Walk,” and ‘‘A Life Time” were 
the poems used. 


A. J. Davis and M. Meenes made a study of one hundred four 
college students in an endeavor to determine the effects of place asso- 
ciation, sex, age, capacity and habitual method of learning on whole- 
and-part learning.! They concluded in general that: More than half 
of the individuals participating in the experiment found the whole 
method superior; only twenty students found the part method superior. 
Of the individuals who found one method superior, only thirty-five to 
forty-one per cent found that method superior which they habitually 
used. Male and female subjects learned approximately equally well 


with the part method, but the female subjects were inferior to the . 


males with the whole method. The younger individuals learned better 
by the whole method, the difference not being reliable, however. 

J. B. Stroud and C. W. Ridgeway,’ using college students under 
conditions of learning to complete mastery under massed conditions, 
found the whole method to be less economical than the progressive 
part or pure part method, but they found no significant differences 
between these three methods in connection with the retention of the 
materials learned. 

Since experimental data so far reported in the literature do not 
justify any definite conclusions as to the superiority of either of the 
methods for learning or retention purposes, the crucial questions seems 
to be under what conditions is either of the methods superior to the 
others and for what individuals is one method superior to the others. 


OUR EXPERIMENTS 


In the spring of 1935, investigations on whole-part learning were 
made by teachers registered in the ‘“‘ Educational Statistics and Experi- 
mental Education ’”’ class of the Division of Adult Education, University 





1 Davis, A. J. and Meenes, M.: ‘‘Factors Determining the Relative Efficiency 
of the Whole and Part Method of Learning.’’ J. Expr. Psychol., Vol. XV, 1932, 
pp. 716-727. 

2 Stroud, J. B. and Ridgeway, C. W.: ‘‘The Relative Efficiency of the Whole, 
Part, and Progressive Part Methods When Trials Are Massed—A Minor Experi- 
ment.” J. Educ. Psychol., Vol. XXIII, 1932, pp. 632-634. 





ee 
ale: 


— nt 
ae apie eneeieal 




















ee 








40 The Journal on Educational Psychology 


of Louisville. Teachers from six secondary schools (five senior high 
schools and one junior high school) and one elementary school partici- 
pated, their students serving as subjects. A total of one hundred 
ninety-five boys and girls took part in the equating of materials and 
six hundred forty-eight other children took part in the experiments 
proper. Ten different experiments were conducted in three subject 
fields, z.e., (1) the learning of poetry, (2) learning of statements relating 
to chemistry, and (3) simple directions relating to typewriting. The 
members of the class performed these experiments in their respective 
classrooms, either working individually or in groups of two. 

Selection of Materials.—Each teacher or group of teachers chose the 
learning material to be used. This material was selected in some cases 
while in others it was composed by the investigators. In six of the 
experiments, poetry was used; in two others, statements about the 
chemistry of a metal and a non-metal were used; while in the remaining 
ones, two sets of simple directions for typewriting pupils were used for 
memorization purposes. These materials are given in the Appendix. 

Equating the Materials.—In an effort to use materials for both the 
whole and part methods of learning that were of equal difficulty, the 
materials selected were presented to control or equating groups. In 
most instances, two stanzas or other learning materials, which were 
thought to be of about equal difficulty, were presented to groups of 
children to be learned by any method desired. (Hereafter, for con- 
venience of discussion, any two sets of material are referred to as Forms 
I and II.) Form I was given to half of the control group and at the 
same time Form II was given to the other half of the group. As soon 
as these materials were learned and the children had written down what 
they could recall, Form I was given to the group of children who had 
just learned Form II, while Form II was given to the group of pupils 
who had just learned Form I. This method of presentation was used 
in order to prevent either group having more practice upon one form 
than upon the other. The scores on Form I were then correlated with 
the scores made on Form II, product-moment correlation being used. 
These coefficients are treated as reliability measures throughout the 
study. By reference to Table I it will be noted that all but one of the 
reliability coefficients are high enough for group comparisons and that 
this one is not so low as to be valueless. These coefficients are, of 
course, used in determining the significance of differences later reported 
in the study. It will be noted further from Table I that the dif- 
ferences between means for the two forms divided by their probable 





Ye FeO _ s"Fr 


Get =a cr wo iw « 


Ten Experiments on Whole and Part Learning 41 


errors range from .13 to 1.77, all well below the limits of statistical 
significance, so that the two forms used may be considered of equal 
difficulty in all instances. 

The Experiments Proper.—After the materials had been thus 
equated the investigators proceeded with the experiments proper. 
Other groups of pupils were supplied with paper for writing down what 
they learned. The materials to be memorized were read to the children 
by the teachers in order to control the learning situations. The form 
to be learned by the whole method and the one to be learned by the 
part method were determined in advance. After each form had been 
read to the learning group a prescribed number of times, the number 
of times each line was read being the same for both methods, sufficient 
time was given for writing down what had been learned. When two 
separate classes were used for experimental purposes, the order of 
presentation of methods was reversed, thus: The whole method first, 
then the part method to one class; to the second class, the part method 
was presented first, then the whole method. As in the equating of 
materials, this was done to equalize practice effects. The number of 
minutes for learning purposes varied with the different experiments. 
No particular form of the part method was specified. Accordingly 
several forms were used in the various experiments. All papers were 
scored objectively, each major idea being given a credit of one. The 
data for the whole method were correlated with the data for the part 
method, the differences between means found, the probable errors of 
these differences determined, and the differences between means 
divided by their probable errors. When a difference between means 
was four or more times its probable error it was concluded that there 
was a significant difference between the effectiveness of the two learning 
methods so far as these experiments are concerned. The nature of the 
experiments and the findings are discussed briefly below. The data 
are summarized in Table I. The learning materials are given in the 
Appendix: 


EXPERIMENT I 


Equating of Materials.—Twenty-seven boys and girls in the 6B class of an 
elementary school participated in this part of the experiment. Typewritten 
copies of two stanzas from an original poem entitled, “Summer Days,’ 
(Appendix A), were used for learning purposes. Ten minutes were allowed 
for learning by any method desired. 

Experiment Proper.—Fifty-eight boys and girls from the fifth and sixth 
grades from the same school composed the main experimental group. Stanza 





ee) ee 


Ee 
<7 2 











VA 
; 
t 


a Se ae 


eS Fe ear 


ee oe 


ae a 


re ot 


+ 


8 ee ae eo a) ee 
AAS RIE TS See ae OO 


; 
; 
; 
: 





42 The Journal of Educational Psychology 


I was presented by the whole method, Stanza II by the part method. Each 
stanza was read eight times. The form of the part method of learning used 
was as follows: The first line was read eight times, the second line eight times, 
and so on, until the stanza was completed, at no time returning to any previous 
line. So far as group averages are concerned, the whole method was slightly 
superior, but the difference was not statistically significant. 


EXPERIMENT II 


Equating of Materials —Twenty-eight boys and girls from an 8A class of 
a junior high school were presented with two stanzas of eight lines each of a 
poem to be learned by any method, (Appendix B). These were learned from 
copy, four minutes being allowed for memorization purposes. 

Experiment Proper.—Thirty pupils from the 7B class of this same junior 
high school learned these two stanzas by both methods of learning, Stanza I 
being learned by the whole method and Stanza II by the part method. For 
the whole method, the first stanza was read all the way through four times. 
For the part method, each line was read four times by the teacher before pro- 
ceeding to the next line. A very significant difference in favor of the whole 
method was found so far as group averages are concerned. 


EXPERIMENT III 


Equating of Materials —Twenty senior-high-school girls took part in this 
phase of the experiment. The two stanzas from ‘‘Summer Days’ used in 
Experiment I were used as learning materials. These were written on the 
blackboard, four minutes being allowed for memorization by any method 
desired. 

Experiment Proper.—Fifty-two other girls from the same high school 
participated in the experiment proper. Stanza I was presented by the part 
method, Stanza II by the whole method, just the reverse of the procedure 
used in Experiment I. A modified form of the part method of learning was 
used: The first two lines were read three times, then lines three and four were 
read two times, then the four lines were read twice; next, lines five and six 
were read three times, then all six lines were read twice, the time used for 
both methods was three minutes. The results were in favor of the whole 
method. 


EXPERIMENT IV 


Equating of Materials——Thirty boys from senior high school B learned 
two selections from Browning’s ‘‘ Andrea del Sarto,” (Appendix C), by the 
method they preferred. Five minutes were allowed for the learning of the 
material. The investigator told them that the results of the experiment ~ 
would be used in connection with their monthly grades. (In all the other 
experiments, the pupils were told that experiments were being made on whole- 
part learning and that the results would in no way affect their school grades.) 





Ten Experiments on Whole and Part Learning 43 


Experiment Proper.—Fifty-four additional boys from senior high school B 
served as subjects. The pure part method was the form of the part method 
used, each line being read seven times, at no time returning to lines previously 
read. A slight difference favored the whole method of learning. 


EXPERIMENT V 


Equating of Materials.—Twenty girls from senior high school C learned the 
poem, “Significance of Color,” (Appendix D). This sixteen-line poem was 
divided into two equal parts, each part being read to the children six times. 
No mention was made of the manner of learning to be employed. 

Experiment Proper.—Forty-nine other girls from this same senior high 
school participated in the experiment proper. The first eight lines of the 
poem were presented by the whole method, the last eight lines by the part 
method. The part procedure used was as follows: The first line was read four 
times, the second line four times, then the first and second lines were read 
together; next the third line was read four times, the fourth line four times, 
then lines three and four were read together; lines five and six were then read 
separately four times, then they were read together; lines seven and eight 
were read in the same manner; next all eight lines were read through once. A 
slight difference was found in favor of the whole method. 


EXPERIMENT VI 


Equating of Materials.—Twenty-five additional girls from senior high 
school C learned the poem used in Experiment V from copy by any method 
they wished. Eight minutes were allowed. 

Experiment Proper.—Sixty-three other girls from the same senior high 
school participated in the experiment proper. The first eight lines were used 
for whole learning, the last eight for the part method. Each part was read 
six times. The part procedure used was as follows: Lines one and two were 
read three times; lines three and four were read three times; then lines one, 
two, three and four were read together once; next, lines five and six were 
read three times, then lines one to six inclusive were read together; lines seven 
and eight were read four times, then lines five, six, seven and eight were read 
together; then all eight lines were read through once. A significant difference 
in favor of the whole method of learning was found in this experiment. 


EXPERIMENT VII 


Equating of Materials.—Twenty girls from senior high school C had read 
to them by the whole method two sets of chemistry statements, one set about 
a metal, the other about a non-metal, (Appendix HZ). The materials were 
read through five times. 

Experiment Proper.—Eighty-one additional girls from this same senior 
high school participated in the experiment proper. The set of statements 














—— ee. =e ear Its << 














. SP aeersa 


SOT Teas gegen tt marke “m 





Sa ee eae 


44 The Journal of Educational Psychology 


about Germanium was learned by the whole method, while those about 
Selenium were learned by the pure part method. A very significant differ- 
ence in group averages favored the whole method of learning. 


EXPERIMENT VIII 


Equating of Materials.—The equating of materials was done in high school 
C as described in Experiment VII. 

Experiment Proper.—Ninety-nine girls from high school A participated in 
the experiment proper. The statements about Selenium were used for whole 
learning and those about Germanium for part learning, just the opposite of 
the procedure used in Experiment VII. Each line was read five times. 
The pure part method was employed. The data show a significant difference 
between means in favor of the part method, again just the reverse of the 
findings of Experiment VII in high school C. We are unable to give any 
satisfactory explanation for the dissimilarity of results from Experiments 
VII and VIII. 


EXPERIMENT IX 
(Called [Xa in Table I) 


Equating of Materials —Twenty-five pupils in high school D served as 
subjects for equating the two sets of simple typewriting instructions, (Appen- 
dix F). Typewritten copies of the instructions were given to the children to 
be learned by whatever method they wished. Four minutes were allowed. 

Experiment Proper.—Forty-one other pupils from this high school partici- 
pated in the experiment proper. Set I was presented by the part method of 
learning, Set II by the whole method. Each set was read five times. A modi- 
fied form of the part method was employed, thus: Sentences one and two were 
read to the class four times, then sentences three and four were read four 
times; next, sentence five was read four times, then all five sentences were 
read once. A significant difference in favor of the part method was found. 


EXPERIMENT X 


Equating of Materials.—This part of the experiment was done in high 
school D as described in Experiment IX. 

Experiment Proper.—Forty pupils from high school E participated. The 
same two sets of simple typewriting instructions were used. Set I was used 
for part learning and Set II for whole learning. All procedures were the same 
as those used in Experiment IX. 


The data given in Table I are in keeping with previously reported 
investigations: There is no general trend in favor of either the whole or 
part method so far as economy of learning is concerned when measured 
by group averages. In five of our ten experiments there were significant 








45 


a ee ee re nee + + 2 


Dee fe 


EE IE FOGG E EAE IE, JES SO RT I PE ae 





PORES a ee ee, ke = ete + ee EE Se 


—e nee Sm _ ota wed 





‘poulquios ¢x] pu¥ vx] sdnolp » 






























































= 
‘= 
S weq | ON [€2°% — [90° F Ze | BL°6E | 962 Lg 18 DX] WeullIedxy wo1y Usyxv, VIEp ZuNMeNb| *XI 
8 H8q | ON [6P'T be F e8° | 89°6 | 00'l+ Lg OF PX] WWouedxgY wo1y usyxe} BIVp Buywnby | Zuyymedséy | 9X1 
704m | %X OS's — [40° F 9S" | Z8°6E | Zz'¢e Lg 1b | 10°'T |¥0° + 98° lOO Eb-0e'HF| G3Iy s0LWeg | Cg | BUIMEdAT | OXI 
~ w¥q | A [IZ'OI— [s0° F Ho | OO'IZ | 00'SI SZ 66 IIA JUeuILedxg wo1y usye, Wep Buyendy | ArjstmeyD| IIIA 
© OGM | 9X ‘OI [80° # 29° | OF ZI | SI'Iz Td 18} f° [20° F 22° [08 FI-OF FI] G3Iy sorMeg | Og | AsrueyD| IIA 
9UM | 8A ? 40° $+ 92° | IT'9 80°2 91 £9 | Ib |hO° + 28° [28°9 —F'D | GIy s0OIN9g | oz A1y00g | IA 
3 2PM | ON ; 40° * ¢¢° | 80°9 Sz'9 02 66 | FI° [80° F 06° |06'OI-G6'OT| 43Iq s0O1Ueg | OZ Aye0g} A 
= 719GMm | ON /|SI" '¥ 98° | €O°Sb | Ib cP ¥8 ro | 8S° [60° F OF 0869-00" 19] 4314 10O1Weg | OF 41y00g | Al 
8 719UM | ON /89°T 90° + 2° | 0O' LZ | Of '8Z ra zo | 69° [60° F 98° [Ob ZE-O1 Se) 43Iy sOIUeg | OZ 4£1300g | III 
3 810UM | A [ess Ol’ + 2b | O8' Zz | Of Ze Ly og | 22°t {20° F 89° [08 F8-08'9¢ az 82 400g / II 
S 99UM | ON [22's cO' ¥ 08" | I¢'6I | e9°2z oF so | €8' (20° * #6" |bI 'ee-FP' ee] Qpuee | Zz 400g} I 
= 
i) poqjeu | poqjeu 
poqjeur aed VBI 27oUM 91008 pouzser quow 
ta tls I 
§ souedng = Pad /P = N | *ad/P I's suBoy epen N duaneas rl 
& suBaWw 
8 
sdnoiZ jejueutiedxy sdnoiZ Zunenby 
4 
L& 
& ONINUVET LUVd-HIOHM NO SLNANIUGIXY NAT, WOUA VLVGQ—'] AIAVY], 








ewpnetren ner = 
Soy 


es, 


a a ee 





La gee ae, eee 





> aD ae tS aS 
ies 


is 


oF Sadr ap pet Spe er ARES 5 ae 


Oe ee a ee ee 


hs 
a 
‘- 


‘ 

“a im 

BS tha 
: 


eo a ees ee eee a ee 
5 + x - i Peiee 


46 The Journal of Educational Psychology 


group differences, three of them favoring the whole method and two 
favoring the part method or a modification of it. Whether the type 
of material learned is a significant factor in determining effectiveness of 
method cannot be determined satisfactorily from our data. Probably 
it is significant that group averages were higher for the whole method 
in all six of our studies where poetry was learned, three of these being 
significantly so. As to why the whole method was superior with one 
group of senior-high-school girls (N = 81) in memorizing chemical 
data and less effective with the other group (N = 99) we have no 
explanation. Examination of the data shows no marked differences 
between the two groups: They are from very similar types of schools, 
having very comparable socio-economic backgrounds. The teachers 
administering the tests were excellently trained and most careful in 
their work. 

Our data indicate that, though generalizations as to superiority of 
method cannot be drawn so far as groups are concerned and without 
reference to the type of material learned, there are many children who 
learn significantly better by one method than by the other, and their 
habitual methods of learning are not satisfactory criteria as to which 
method is superior for them. In investigating this we determined the 
probable errors of differences between single raw scores for the ten 
experimental groups and translated all differences between scores by 
the two methods into probable errors favoring either the whole or part 
method. The correlations between the two forms with the equating 
groups were used as reliability coefficients and the standard deviations 
of the experimental groups were used as the measures of variability. 
Kelley’s* formula was used in calculation of the probable errors: 


PE pore = 6745 04/1—ril 


These findings are presented in Table II. The numbers in the 
caption refer to the various experiments. Probable error values are 
given at the left of the table. The zero row contains all values between 
one probable error in favor of the part method and one probable error 
in favor of the whole method. All values between one and two prob- 
able errors are included under the probable error index, 1; all between 
two and three are under the index, 2; etc. With the exception of 
Experiment 7, for which the data were not available, the method 
habitually used is indicated. In Experiment 1, for instance, six 





* Kelley, T. L.: Interpretation of Educational Measurements. World Book Co., 
1927, Formula 17. 


sa — DD Se FF 


’ kn’ &} ' , 





Ten Experiments on Whole and Part Learning 47 


TaBLE I].—DiFrFERENCES BETWEEN SCORES BY WHOLE AND Part METHODS OF 
LEARNING EXPRESSED AS PROBABLE ERRORS OF THESE DIFFERENCES 


(Probable-error values range from fifteen in favor of the part method to four- 
teen in favor of the whole method. Class indices are at their class limits farthest 
from zero. The zero class includes all values between 1 PE in favor of the part 
method and 1 PE in favor of the whole method. The probable error of a single 
score is given in the PE row for each of the experiments. Habitual method of 
learning is indicated by P for the part method and W for the whole method. Data 
for Experiment IX include those given for Experiments [Xa and [Xb in Table I.) 














Experiment 
I Il Ill IV Vv VI Vill ) Vill IX |) Total 
ME. ected 58 30 55 54 49 63 81 29 81 504 
Ree 1.08 | 2.17 | 2.17 | 11.18 | .75 .68 1.09 | 1.49 | 2.41 





























Probabl method i 
“ernie 









































15 P 1 1 
Ww 

14P 
Ww 1 1 

13P 
Ww 1 1 

12P 
Ww 1 1 

11P 1 1 
Ww 

10 P 1 1 
Ww 1 1 
9P 3 3 
Ww 

“SP 4 1 . 1 - 6 
Ww ov l ies a : 1 
ge . a 2 3 yg 
Ww 1 = eis 3 a 
6P 2 i 2 ne 3 aif 1 1 y 
WwW 1 - * ix ca ; 1 2 
§P + 2 iia 6 2 oy 2 3 19 
Ww 

4P 3 sé 1 3 4 1 3 2 17 
Ww a ai 1 ~~ ~ 1 1 3 6 

Not significant (65.7 per cent) 

3P 2 re ref 1 Sy o 1 3 6 13 
Ww ae vee l . na ion l 1 3 
2P 2 1 4 7 1 3 1 7 2 28 
Ww 2 ve 1 1 se 2 1 7 
1P 4 2 5 7 2 8 1 4 10 43 
Ww a = 1 2 ‘es ae 1 a 
OP 3 6 10 20 3 5 18 7 1l 83 
WwW 5 3 3 2 7 4 be 2 5 31 
1P 1 3 1 4 6 1 2 18 
Ww 2 2 4 1 5 ~ 7 1 6 29 
2P 3 1 1 3 2 7 15 3 35 
Ww 1 2 3 2 3 4 aie 8 23 





a A 


<email a ita isaac tart 








me att RE ee aN To 


ae aes 





ty. * 











ee lest 94 On ¢ 
es ae 6 dee ee 


oe ine 


ee Pee ps 
— 
c= = ad ow 


+: 
ea TS Pe 


48 The Journal of Educational Psychology 


TaBLe II.—(Continued) 


























I | Il | Ill IV v VI | Vil VIII Ix | Total 
Tsccnetsoss 58 30 55 54 49 63 81 29 81 504 
See 1.08 | 2.17 | 2.17 | 11.18 75 .68 1.09 1.49 | 2.41 














Probable errors (whole method superior) 
Significant (17.2 per cent) 





3P 1 2 1 2 i es ® 1 15 
Ww 2 2 2 1 fl as Wd 1 & 
4P 1 sie 3 5 i") 
Ww nk 3 ee 5 me 3 11 
5P 2 3 2 & 7“ 15 
Ww 2 4 ane 1 ie 1 & 
6P 2 2 s 12 
Ww + A 2 2 
7P 1 my 1 1 3 
Ww 1 3 iF 4 
8P 1 1 6 8 
Ww 1 1 re 2 
9P x 1 2 3 
Ww 1 Nis 1 2 
10 P 
Ww 
11 P 1 1 
Ww 
12P 
Ww 
13 P 
Ww 
14P 
WwW 1 1 



































children scored between six and seven probable errors better by the 
part method than by the whole method. Two of these used the part 
method habitually and one habitually used the whole method. Taking 
the ten groups as a whole, 17.2 per cent learned significantly better by 
the whole method and 17.1 per cent, significantly better by the part 
method, four or more probable errors being counted as statistically 
significant. We may say with reasonable assurance that about one- 
third of the children learned significantly better by one method than 
by the other and that they would be discriminated against if required 
to use the method which is less effective for them. 

Of the children for whom we have adequate data, 29.2 per cent 
use habitually the method which is less effective for them. Of those 
learning significantly better by one method than by the other, for 


WwW) 








Ten Experiments on Whole and Part Learning 49 


whom our data are complete, 36.1 per cent habitually use the method 
which is less effective for them. 

Our data show no superiority of either the whole or part method of 
learning for either high- or low-IQ groups. Consequently, neither 
method is more adapted to one group than to the other. Students 
having IQ rating above 111 (one hundred eight in number) were com- 
pared with students having IQ ratings below 95 (one hundred seven in 
number). The two groups comprised approximately forty-three per 
cent of the entire distribution. The basic data are given in Table III. 
The values used are those reported in Table II but only for the 
extremes of the distribution. It will be noted that, on the average, 
both groups are less than one probable error of a single score away from 
no difference between the whole and part methods. It is interesting 
that only 11.1 per cent of the high-IQ group deviate four or more 
probable errors from no difference between methods while 23.3 per cent 
of the low IQ group so deviate. It may be noted further that only 
17.2 per cent of these two extreme groups combined deviate four or 
more probable errors from no difference while 34.4 per cent of the entire 
distribution made significantly better scores by one method or by the 
other. In our investigations, then, students of median IQ are much 
more likely to learn better by one method, either the whole or part, 
than by the other. 


TaBLe IIJ.—Comparison oF Hicu- anp Low-IQ Groups in LEARNING BY WHOLE 
AND Part METHODS 














High-IQ group} Low-IQ group d/e 
(above 111) | (below 95) ' 

Bs Scie habe OGG s Cabs pce S I 118.2 88.0 
eae Wiis knidiedith wiaedecedabin dae) 108 107 
Mean in PE of single score............. —.12* 21 434 
Per cent scoring significantly better by 

ss cb adnes own ahaagien > § « 6.5 11.2 
Per cent scoring significantly better by 

INI. cn Davebeaed viwwewaue 4.6 12.1 








* A minus sign shows superiority by the part method. 


SUMMARY AND DISCUSSION 


1. Previously reported investigations indicate that neither the 
whole method nor the part method of learning nor any modification of 


i 
; 


eee 





ht ial i ea 





Ae att 
yt 


Pr, 7 
bi 
mii 
# 








50 The Journal of Educational Psychology 


them is generally superior and that much remains to be done before 
proper applications can be made in the field of pedagogy. 

2. Ten investigations are reported in this study. Six of them deal 
with the memorization of poetry, two with chemical information and 
two with directions about typewriting. The numbers of cases range 
from thirty to ninety-nine, a total of six hundred thirty-eight children. 
In all instances the materials were equated prior to the experiments 
proper to determine whether they were of comparable difficulty. 

3. The whole method of learning was superior in all six of the experi- 
ments involving poetry, so far as group averages were concerned, the 
differences between means being statistically significant in two of the 
six experiments. 

4. The whole method was superior with one group and the part 
method with the other group memorizing chemical information, the 
differences between means being statistically significant in both 
instances. We have no explanation as to why this reversal of superior- 
ity was found. 

5. In memorizing materials about typing, one group found the part 
method superior to the whole method. There was no significant differ- 
ence between group averages in the other experiment. When the two 
groups were combined, group averages were not significantly different. 

6. Of the five hundred four children for whom we have adequate 
data, 34.3 per cent learned significantly better by one method or the 
other; 17.2 per cent learned significantly better by the whole method 
than by the part method; 17.2 per cent learned significantly better by 
the part than by the whole method. 

7. The child’s habitual method of learning is not a satisfactory 
indication of which method he uses more effectively. Of the four 
hundred twenty-three children for whom we have adequate data, 
29.2 per cent use habitually the method which is less effective for them. 
Of those learning significantly better by one method than by the other, 
36.1 per cent habitually use the method which is less effective for them. 

8. Neither method was found superior for either the high-IQ or 
low-IQ group nor was there a significant difference in the effectiveness 
of the methods for them, so far as group averages are concerned. The 
low-IQ group had more cases of extreme deviation in favor of both 
methods, however. Neither the low-IQ nor the high-IQ group con- 
tained as large a percentage of children who learned significantly better 
by one method or the other as did the remainder of the distribution or 
the distribution as a whole. 


i 


1] 


12 


13 


14 








— 


ac. wo = 


oo | 


Ten Experiments on Whole and Part Learning 51 


9. Our data indicate that, where either method of memorizing is 
prescribed, or where the child is left to choose his own method, a 
significantly large percentage will be handicapped. The desirable 
pedagogical procedure would be to determine whether a child learns 
significantly better by one method than by the other and have him use 
the more effective. This could be done by the teacher in a few minutes, 
using the procedures we have employed in these experiments. If the 
learning materials included in the Appendix are very unlike those the 
child is to learn, comparability of materials to be learned could be 
determined readily by the procedures we have outlined under “‘ Equat- 
ing of Materials.” 


BIBLIOGRAPHY 


Studies not summarized by McGeoch in No. 7 below. 
. Beeby, C. E.: “An experimental investigation into the simultaneous constit- 
uents in an act of skill.” Brit. J. Psychol., Vol. XX, 1930, pp. 336-353. 
2. Crafts, L. W.: ‘‘Whole and part methods with unrelated reactions.’”’ Amer. 
J. Psychol., Vol. XLII, 1930, pp. 591-601. 
3. Davis, A. J. and Meenes, M.: ‘Factors determining the relative efficiency of 
the whole and part method of learning.” J. Expr. Psychol., Vol. XV, 1932, 
pp. 717-727. 
4. Decroly, O.: ‘“‘Le principe de la glogalisation appliqué a l’éducation du langage 
parlé et écrit.” Arch. de Psychol., Vol. XX, 1927, pp. 324-346. 
5. Ewald, F., ‘‘Untersuchungen iiber die Komplexweite des Gediichtnisses.”’ 
Arch. f. d. ges. Psychol., Vol. LX VII, 1929, pp. 161-240. 
6. Libby, W.: ‘‘An experiment in learning a foreign language.”’ Ped. Sem., Vol. 
XVII, 1910, pp. 81-96. 
7. MeGeoch, G. O.: “‘Whole-part problem.” Psychol. Bul., Vol. XXVIII, 1931, 
pp. 713-739. 
8. McGeoch, G. O.: “‘The IQ as a factor in the whole-part problem.” J. Ezp. 
Psychol., Vol. XIV, 1931, pp. 333-358. 
9. McGeoch, G. O.: “A revaluation of the whole-part problem in learning,” 
J. Educ. Res., Vol. XXVI, 1932, pp. 1-5. 
10. McGeoch, G. O.: ‘“‘Whole-part problem in memorizing poetry.” Ped. Sem., 
Vol. XLITI, 1933, pp. 439-447. 
11. McGeoch, G. O.: ‘‘The conditions of reminiscence.” Am. J. Psychol., Vol. 
XLVII, 1935, pp. 65-89. 
12. Michael, E.: ‘‘Ein Schulversuch iiber das Lernen in Teilen und in Ganzen.” 
Zsch. f. pad. Psychol., Vol. XVII, 1916, pp. 519-522. 
13. Pyle, W. H.: ‘“‘Retention as related to repetition.” J. Educ. Psychol., Vol. II, 
1911, pp. 311-321. 
14. Stroud, J. B. and Ridgeway, C. W.: ‘The relative efficiency of the whole, part 
and progressive part methods when trials are massed: A minor experiment.” 
J. Educ. Psychol., Vol. XXIII, 1932, pp. 632-634. 


— 


: 
: 
j 
; 
| 
: 
| 
} 
| 








ha 
: 


52 The Journal of Educational Psychology 


APPENDIX A 


RE A OS ae ee 


Summer Days by D. Pond, age sixteen from Anthology of Poems by 
Children. 

Stanza I used for whole method in Experiment I and for part method in 
Experiment ITI. 
! A gray sage stretches across the plain, 
; Fi And the cactus blooms are red; 
Ei The earth is fresh from a summer’s rain, 


aS erie 


| And the winter days are fled; 

ve The sweet-scented pines sway to and fro, 

Tie And the rivers are flooded with melting snow. 
? 
4 


Stanza II used for part method in Experiment I and for whole method in 
Experiment III. 


A long trail winds to the sunset hills 
Out over the mesas wide; 

Through canons cool, by tiny rills 

' With spruces on either side. 

fy, And I long with a longing I cannot still, 
| To be home again near the sunset hill. 


i 


APPENDIX B 


1 | By Marteena Adamson (Current Science, April 19-19, 1935) 
Stanza I used for whole learning, Experiment II. 


When grandma was a little girl 
She rode across the plains 
In high-wheeled wagons rough and slow, 
Because there were no trains. 
j The oxen slowly pulled the plow. 
The grain was cut by hand, 
| The women spun their own coarse cloth, 
4 The candles lit the land. 


Stanza II used for part learning, Experiment II. 


But now you speed across the land 

In cars and aeroplanes; 

Sometimes you take a touring bus, 
Sometimes you go on trains. 

Your clothes are soft and factory-made; 

You use electric lights 7 

Bi: That light your homes, and towns and streets 
And turn days into nights. 


7 
ee Oo. 


* - 
ia 
' 





Ten Experiments on Whole and Part Learning 


APPENDIX C 


From Browning’s “ Andrea del Sarto.” 
Part I used for whole learning, Experiment IV. 


But do not let us quarrel any more, 

No, my Lucrezia; bear with me for once: 

Sit down and all shall happen as you wish. 

You turn your face, but does it bring your heart? 
I’ll work then for your friend’s friend, never fear, 
Treat his own subject after his own way, 

Fix his own time, accept too his own price, 

And shut the money into this small hand 

When next it takes mine. Will it? Tenderly? 
Oh, I’ll content him—but to-morrow, Love! 


Part II used for part learning, Experiment IV. 


I often am much wearier than you think, 
This evening more than usual, and it seems 
As if—forgive now—should you let me sit 
Here by the window with your hand in mine 
And look a half-hour forth on Fiesole, 

Both of one mind, as married people use, 
Quietly, quietly the evening through, 

I might get up to-morrow to my work 
Cheerful and fresh as ever. Let us try. 
To-morrow, how you shall be glad for this! 
Your soft hand is a woman. 


APPENDIX D 

First eight lines used for whole method, Experiment V and VI. 
Red is rich, vital, and warm, 
Often used as a sign of alarm. 


Orange is bright, decorative, cheery, 
Should not be displayed by the aged or weary. 


Yellow is soft, cozy, and mellow, 
Bask in the sun’s glow, lazy yellow. 


Green is the sign of something growing, 
Refreshing, cool, life and vim glowing. 


Last eight lines used for part method, Experiments V and VI. 


Blue is aloof, distant, and cool, 
Blue skies, blue water in a blue pool. 


~S j 
; 
: 





Ay we + 


rope anes seme er A 
. ~ 


Se -~ 
SoG ae: 
~ 


i 2 ¥ 


kre 


— 


PS 


ne i om 


mre RS i er me pas ~ 


tons 


—- 





eee oe 


——————io-“ i 














The Journal of Educational Psychology 


Violet, rich, warm, elderly, royal, 
No characteristic of those who toil. 


Black so forlorn, depressing like mourning, 
Mystical, classical, old folks adorning. 


White is the symbol of cleanliness, truth, 
Holiness, purity, background of youth. 


APPENDIX E 


_ Used for part learning in Experiment VII and for whole learning in Experi- 
ment VIII. 


Selenium 


Selenium belongs to the sulphur family. 

The atomic weight of Selenium is 79. 

It is obtained as a by-product in the refining of copper. 
It conducts electricity in the light but not in the dark. 
It is added to glass to produce a fine red color. 


Used for whole learning in Experiment VII and for part learning in Experi- 
ment VIII. 


Germanium 


Germanium is a white lustrous metal. 

The melting point of Germanium is 958. 

It was discovered by Winkler in the year 1886. 

It dissolves in sulphuric acid but not in hydrochloric acid. 
It unites with oxygen to form Germanium oxide. 


APPENDIX F 


Used for part learning in Experiments IX and X. 


1. Nine spaces from the top of the paper, type the word, Kentucky. 
2. Space down fifteen times and write your home address. 

3. On the eleventh line below this, type the name of your school. 

4. Go down eleven spaces further and write today’s date. 

5. Now space down seven times and type your full name. 


Used for whole learning in Experiments IX and X. 


1. Type your last name twelve spaces from the top edge of the sheet. 

2. Eight spaces below this write Rockwood High School. 

3. Space down fourteen times and type the words, Jefferson County. 

4, Fourteen spaces below this line, type the words, Louisville, Kentucky. 
5. Space down eight times and type your first name in capitals. 











THE VARIATION IN PATTERN OF FACTOR LOADINGS 
RUSSELL C. SMART! 


Institute of Child Welfare, University of Minnesota 


Although in some instances as many as thirty or more tests have 
been used, many of the studies involving factor analysis reported in 
the literature have employed a rather limited number of variables. 
The question may well be raised whether such a limited number 
furnishes a sufficiently adequate sampling of total behavior to justify 
speaking of factors as general. If the number of tests had been 
increased it is improbable that the same factors would have been found. 
This study was undertaken to discover whether or not the factors 
remained stable when the number of variables was small, that is, 
whether or not factor loadings maintained their size and relative 
standing as more tests were added to the correlation matrix. 

The following tests were used as variables. The tests will be 
designated hereafter by the shortened terms. In the McCarthy 
Language Survey the children are observed in two situations—con- 
trolled and free. In the controlled situation the children are observed 
singly. Pictures, toys, and other objects of interest are exhibited, and 
fifty consecutive responses of the child are recorded verbatim. In the 
free situation, fifty consecutive responses are likewise recorded while 
the child is engaged in free play with other children. 


Minnesota Preschool Scale 
(Verbal and non-verbal parts scored 


sk chrs tnd hea u cmae 6 leo we ea Minn V 
Minn NonV 
Arthur Performance Scale................... Arthur 
Stanford Revision of the Binet Scale.......... Stanford 


McCarthy Language Survey 
(Each measure from the free and the con- 
trolled situations) 
Average number of words in fifty responses.. Average 50 c 
Average 50 f 
Average number of words in the five longest 
NG dies ii Si Wd old weds xed N Ede Average 5 longest c 
Average 5 longest f 
Per cent of pronouns in the total number of 
NGS fast eacine CUR aaa Ev abs essa bt an Per cent pronouns c 
Per cent pronouns f 





1 The writer wishes to express his indebtedness to Dr. Florence L. Goodenough 
and Dr. John E. Anderson for their help and suggestions. 


55 


FR eek IE ME ee PEED Te NAM oo my 





Pn 


Se ee A 


a 


ERE ES 


4 
" 
: 
on 


56 The Journal of Educational Psychology 


The age level at which it was decided to make this analysis was 
five and a half years, because this age is below the upper limit of two 
of the intelligence tests and above the lower limit of the other. Scores 
for sixty-six nursery-school and kindergarten children were available. 
In the case of the Language Survey, all of the tests were given at the 
age of sixty-six months; for the Minnesota and Arthur tests the age 
range was from sixty-four to sixty-seven months; and for the Stanford- 
Binet the range was from fifty-five to ninety-one months. Because 
of. the wide range in ages, especially for the Stanford, correlations of 
the tests with chronological age were computed and included in the 
correlation matrix (Table I). Age thus becomes one of the variables, 


TABLE I.—INTERCORRELATIONS BETWEEN THE TEN TESTS AND CHRONOLOGICAL 























Acr* 
Per cent) Per cent 
' Minn Stan-| Ave.| Ave. pro- pro- | Ave. 5 
Minn V Non V Arthur | Age ford | 50¢} 50/f | nouns | nouns | long. c 
c f 

Minn NonV...... .3531 
I 4 ba ho deo .4121| .8214 
i. cetuiey awn sen .0003| .1242| .2286 
Stanferd......... .4108| .2596) .1592/.6192 
Ave. 50 ¢......... .2386} .1099} .1182/.0000/.0981 
BMG: BO foe oscsce .4236| .3145) .4856/.0000).3143). 2954 
Per cent pron. c...|— . 3005] — . 1351] — . 1366) .0000) .0766) .2909| — .0740 
Per cent pron. /f...| .0376| .0173) .0272 0000] . 2567 .0529| — .0414) .0162 
Ave. 5 long. c..... .2909; .0096}) .1067|.0000).1780|.5353) .2471|/—.0515) .1137 
Ave. 5 long. f...... .6211}| .3850) .4248).0000)].4643).4525) .7920|—.0310) .0086 | .7688 




















* PE, = .0873, when r = .00. 


and loadings in the factors are computed for it. 

A factor analysis, according to Thurstone’s 1933 method of 
multiple factor analysis, was made of the eleven variables. 

In the same way, analyses of different groupings of the variables 
were made to examine the stability of the factors. In the first series 
of these analyses, the four intelligence tests were analyzed, and subse- 
quently the McCarthy tests were added one at a time, starting with 
the one showing the largest loading in the original analysis, Average 5 
longest f, and continuing to Per cent pronouns c, the test which showed 
the smallest loading. (Plate 1.) In the second series, the four 
McCarthy tests showing the largest loadings were analyzed first. 
Then, in order Per cent pronouns f, Per cent pronouns c, Stanford, 
Minnesota NonV, Minnesota V, and Arthur were added. (Plate 2.) 











The Variation in Pattern of Factor Loadings 


57 


In the third series, starting with the analysis of the eleven variables, the 
test showing the largest loading in one analysis was dropped out of the 


succeeding one. (Plate3.) Second factor loadings were calculated for 





ae wo ee =a jo oi Dedede ow ms _— 2. 


 -— a me wm 








‘aaee 





ed 


MINN. NON-~V. 





STANFORD 











MINN V. 
CA 



































i 


VARIABLES 























NJ. 
ig NAVE. S80 ¢ 
_ : 
IC A : 
: 
p—1'%_ PRONOUNS f 
: % PRONOUNS 
; ee ee ee Si Et i -_ 
Puate 1. 
; the various groups of variables in the first and second series. These 
are shown in Plates 4 and 5. 
, It will be seen in Plate 1 that as the language tests are added, the 
loadings of the intelligence tests decrease, in general. In Plate 2 it 
will be seen that likewise the loadings for the language tests decrease as 
) the intelligence tests are added. It is interesting, further to notice 








* ~ v 
et ot 


wee 
ae ee 


rh het 


Tee me ct 
5 te 
ge yes 29 


= rw eer 


- —— eenaloe 
> 


gee pth PY en AUPE ES, SR Pt 
: et ea -_- +. 
a . = . Sete CO ee: rn 


ease 




















58 The Journal of Educational Psychology 
that in Plate 1 the loading for Minn V rises from .50 to .63 when t’ 
Average 5 longest f is added to the group, and that it thereafter remains 
practically constant as the other language tests are added. The t 
FIRST FACTOR LOADINGS THE ELEVEN) VARIABLES ff 
H GRO} S AH-M. : 
AVE. 5 LONGEST mba : 
N a =, : 
AVE. 5 LONGEST eho] s : 
AVE. 50 # . 4 — 
\ \ MINN. V. A 
meee ¥ \ and ». MINN. NON-V. 
\ “I : 
\ / 
\\ f 
Pind a. | ‘ 
pf. : 
™L 4C.A. ‘ 
—— , 


pnd 
h—~~J¢% PRONOUNS ¢ 















































1% PRONOUNS e 
meuceeneees pe ee ee ee ree 
Puats 2. 
a . v 
loading for the latter, however, rises from .65 to .79 to .86 to .87 as ¥ 


Average 50 f, Average 5 longest c, and Average 50 c are added, respec- 
tively, and drops only to .85 as Per cent pronouns f and Per cent I 
pronouns c are added. The loadings for Average 5 longest c follow 
much the same pattern. In Plate 2 it will be noticed that the loading 
for Minn NonV rises from .43 to .60 when the Arthur test is added to 





The Variation in Pattern of Factor Loadings 59 


the group. The other intelligence tests remain relatively unchanged. 
That the loadings of the first factor depend on the composition of 
the group of variables is argued further by Plate 3. In this series, fit 


_ eee Oe ee eae ee a a aes (ame aaa ee ee & Se oe ee oe oe ee 








H so FIRST FACTOR 


Te eve 








Rit it te a ee 


Li 








‘ee eee ee ee oe ee oe ee 


eee ee eo oe oe 


-_— mr 



































[2 oe oe ee a ee ee ae Lio Dhl (oe new ew 


x whe. = 
Puiate 3. 














will be recalled, the variable having the largest loading in one analysis 
was dropped in the succeeding one, e.g., Average 5 longest f is omitted 
in N, Arthur in O, Stanford in P. Not only do the amounts of the 
loadings vary greatly, but their rank orders change from one analysis 
to the next. 

It will be seen that the loadings for the second factor change even 
more violently than do those for the first factor. The change is 











ia’ 


especially marked from £ to F in Plate 4, and from M to A in Plate 5. 
The change from J to K in Plate 5, though less violent than either 


The Journal of Educational Psychology 


e 
: 
ye 
| 
+ 

) 

¥ 


of the others, is still appreciable. The way in which the loadings 





oo a or me ee ‘ae co i _ oo  e-s ‘=a: i a a oe a oe eee 


THE VARIABLES 
ArG. 


Cn c 
Fe SOc 


f 
Lue suancest 











SECOND FACTOR 





oe oe ee oe a ee et 


—_ a 


STANFORD “a 
~ 


ime 


CA 


[a oo a oe oe oe ee es 











MINN. V. 


tx 


% PRONOUNS ¢ [ 
ae 50 f : 
“MINN. V. 


STANFORD 


Riti tii iii 





_}% PRONOUNS 
JCA. 





= 
/ 


2g MINN. NON-V. 


‘ 
TITITItTTyI 


~ MINN. NON-~V. / 


- 2 oe oe ee 



































_-_ 2 oe me Dial a onl se we im a io oi 2 ae oe a ae oe ae oe a 


B Cc ®) E F G A 
Piate 4. 





keep their order, and, in general, their size is very marked in F, G, and 
A. This regularity is almost as marked in H, J, and J, and in K, L, 
and M. The lack of consistency in the second factor loadings seems 
to show that the second factor varies, depending on the composition 
of the group of variables analyzed, even more than does the first factor. 

In naming the factors resulting from a factor analysis, the tests 
are examined to ascertain what components the tests with large factor 


(é 


re @® ww © 








The Variation in Pattern of Factor Loadings 61 


loadings possess that the tests with small factor loadings have in only 
small degree. For any given analysis this is a relatively easy task. 
The factors found in studies reported in the literature have been named 





ase wen ee ee ee oo oe oe ee ee ee ee ee ei 





SECOND FACTOR L 





ce See eee eee 


AVE. 50 ft 


oo oe oP ae oe oe 


‘—_- oa 





AVE. 5 LONGEST 7 


7 


























a 

MINN. V. : 

. 

4 

STANFORD i 

% PRONOUNS f : 

4 

4 vA ‘ 

H A. ' 

+4 +4 

: NN. NON-V. 

~ 401) AVE. 5 LONGEST «< [-"” . ARTHUR sd, 
4 

[oo a em me [om a a am ams spr sa ew @ \wenaenagnwpananea 




















‘General ability to perform mental tasks of the kind presented by our 
tests,” “verbal ability,” “visual form perception,”’ etc. 
| In the present study it has been shown that the factor loadings 
| change as the tests represented in the correlation matrix are changed. 
The factors found in the various analyses would thus be named differ- 
ently, depending on the analysis selected. We could probably find 
general terms to describe the factors in each case, which would satisfy 


aaa 








Aino ame SE pert 


on 
teat a wpttay 


62 The Journal of Educational Psychology 


the observed differences in loadings. This is clearest in the second 
factors. In C, D, E, or K, L, M, the second factor could justifiably 
be named something akin to age or development. In F,G, or A, how- 
ever, the second factor could with as much right be called proficiency 
in language. The first factor in any of the groupings would probably 
be called general intelligence. These names are arrived at after 
examination of each of the analyses in turn. 

If, however, we observe the factor loadings in a series of the groups, 
we find that a general name for the first factor is probably not justi- 
fiable. Notice how, for instance, the loading for Minn NonV rises 
when Arthur, a very similar test, is added to the group (Plate 2). 
The “‘general ability”’ posited for group M seems not to be the same 
as the “‘general ability” posited for group A. There is much less 
“ability in performance tests’? in the former than in the latter. 
Similarly there is more “language ability” in the “‘general ability” 
of D than in that of C, and still more in E and F. 

Rather than call the factor general, implicitly assuming that the 
same factor will be found in any combination of similar tests, it would 
seem better to call it a common factor. Here we implicitlv !imit the 
factor to the group of tests with which we are working. 

Furthermore there is evidence that the factor loadings vary as the 
groups of subjects given the tests are changed. In factor analyses 
made by the writer, of data collected by Rundquist and Sletto,' it 
was found that not only did the factor loadings of the scales change 
as the groupings of scales were changed, but that they also changed 
from one group of subjects to another. In this study Rundquist and 
Sletto measure the effect of the depression on attitudes. Scales to 
measure morale, inferiority feelings, the family, law, economic con- 
servatism, and education were included. The subjects are as follows: 
DPR—fifty-two males, mid-depression cases (of higher socio-economic 
status than most people on relief); CSU—one hundred males and 
one hundred females, selected to match the population of Minneapolis 
as to proportions of socio-economic status; Soc I—one hundred males 
and one hundred females, typical college sophomores; CSHS—one 
hundred males and one hundred females, selected from high-school 
students, to match the population of Minneapolis, as above; Standard 





1 Rundquist, E. A., and Sletto, R. F.: Personality in the Depression: A Study in 
Attitude Measurement. Institute of Child Welfare Monograph No XII. University 
of Minnesota Press. Minneapolis Minn., 1936. XXII + 398p. 


1_ee stage | 


ses eS | 


oo pry 


ll > ee con oo | 








The Variation in Pattern of Factor Loadings 63 


—five hundred males and five hundred females, selected from the 
larger groups of which the above are samples, to match the Minneapolis 
population. The first and second factor loadings for all nine groups 
of subjects are given in Table II. The first factor loadings are largest, 
for all groups of subjects, in the morale scale. Other than this, the 
factor loadings for both the factors calculated, show no consistency 
for the various groups of subjects. The greatest differences are 


Taste I].—First- anp Seconp-Factor Loapines or Sr ArrirupE Sca.zes* 



























































Standard male Standard female CSU male 

I II h? I II h? I II h? 
deena ka .796 .049 |.636|.745 |—.178 | .586).838 .176 |.733 
Inferiority........... 670 | .382 |.470|.566 |—.439 |.513|.500 | 571 |.576 
Pak cciedesdden .538 |—.198 |.328).549 .072 |.306|.477 | — .047 | .230 
ati ht 'sié hilate a a .650 |—.193 |.460).644 .466 |.632|.676 | —.399 |.616 
Economic conserva- 

to asia wine nae 299 .240 |.147|.398 | —.052 | .016).408 .000 |.166 
Education........... .596 | —.168 |.383)].548 .091 |.308).562 | —.114 |.328 
Led duwhs wane ee .3525| .0158 .8417| .0762 .3438) .0886 

CSU female HSCS male HSCS female 
Eo wink dng 0 eo .769 | —.256 | .656).728 .156 | .554).833 .122 |.708 
Inferiority........... .604 | —.352 |.488).553 .438 | .498) .668 .469 |.666 
SE ROE Se .618 .276 |.458).550 | — .284 | .383).592 .150 |.372 
aes a oat .698 .377 |.629|.642 | —.328 |.520).660 | —.282 |.515 
Economic conserva- 

OR nee pF: .390 | —.204 |.194/.236 .207 |.098).498 | —.164 |.274 
rer .702 .229 |.545).580 | — .206 |.378).609 |—.188 |.406 
Ba as Ui 6 e'e dana .4117| .0836 .3344| .0816 .4242| .0665 

DPR male Sociology I male | Sociology I female 
NS iii 6 bite ot .798 |—.120 |.651).843 .158 |.736).650 .182 |.456 
Inferiority........... .712 | — .344 |.625).582 | —.031 | .340).324 .020 |.105 
Family..............|.518 | —.222 |.318).490 .023 |.240).474 | —.162 |.250 
eo ire .760 .320 |.680).598 | —.212 | .402).542 .054 |.296 
Economic conserva- 

a bisa ke inh eb .512 .510 |.522).371 | —.122 |.152).428 .562 |.499 
Education........... .618 | —.172 |.412).580 .564 |.654|.234 |—.375 |.195 
DE ts a eka ie .4508} .0956 .3535| .0674 .2140| .0865 
































* For descriptions of scales and of subjects see Rundquist and Sletto, op. cit. 





eh ee 
~~ 





~ 0 


FR; Pitas 
ee 


— 


ne a OEE 


- 

wi 
: 
; 
8} 

* 
ij 

cH 
} 


eee oe ee 


| 
SSeS 


wie os 


> 


; 5 
= ie3 bint e- aR - nT Na ae nel me atten A 
neg On SE ee ee gp ey Pee Se 


air 
> aa - 
— 


al 1 Tapelage Fee SOR eee va at 
ea. 
Riri : 
Se eR ao 
cs > ~ 


64 The Journal of Educational Psychology 


between the factor loadings of the scores made on the scales by male 
students in Sociology I and the factor loadings of the scores made by 
female students of Sociology I. That this is not entirely a sex differ- 
ence is shown by the fact that the loadings for the first and second 
factors are not the same, in size and rank order, for any two groups of 
subjects, even when groups of the same sex are compared. 

In the factor analyses of the intelligence tests it has been shown 
that using sub-groups of a group of ten tests stable factors can not be 
found. The factors change, depending on the tests included in the 
group analyzed. This is more apparent in the second factors than in 
the first. In the factor analyses of the attitude scales the factors 
change in the same way, when the grouping of variables is changed. 
There is also a striking change in factors from one group of subjects 
to another with the scales kept the same for all groups. With such 
a small number of variables as were used in these two series of analyses, 
factors are not found which remain stable as the tests that are analyzed 
or the groups taking the tests are changed. 





tes 


hal 
juc 


the 
of ° 


cor 


tea 
rat 


tea 
loy 


me 
rat 


def 


tea 








VARIOUS TECHNIQUES OF COMBINING RATINGS! 


RUDYARD K. BENT 
University of Arkansas 


One of the weaknesses of any study requiring the use of measures of 
teaching performance lies in securing reliable and valid judgments. 
The validity and reliability of rating scales are probably affected by 
halo influence and by variations in the use of ratings made by different 
judges. Different parts of rating scales are used by various teachers 
and similar numerical ratings are not necessarily comparabie. Thus 
the lack of well defined standards of achievement reduces the validity 
of teacher-rating scales. 

The purpose of this study was to determine whether coefficients of 
correlation between conditioning factors and judgments of student- 
teaching performance were influenced by varied methods of combining 
ratings. 

The scale employed to rate student-teachers at the University 
High School of the University of Minnesota makes provisions for rating 
teachers on ten items: Personal grooming, teaching personality, 
loyalty and codéperation, vitality, knowledge of subject-matter, skill 
in method, achievement of pupils, discipline and prediction. The 
mechanics of the rating device are simplified by merely requiring the 
raters to make a check mark along a line after each item which is 
graduated in twelve divisions. The means and extremes are well 
defined.’ 

The reliability of the scale found by repeated ratings by critic 
teachers was .58. 

Although the University High School Rating Scale has the limits 
and the mean of each item defined, teachers disagree as to the relative 
importance of the various items, and ratings made by several teachers 
are not comparable. Even though the several judges agree as to the 





1 Research paper, No. 451, Journal Series, University of Arkansas. 
2Item V of the rating scale: Consider the practice teacher’s knowledge of 
subject-matter. 


Knowledge inaccurate Knowledge fairly Extremely accurate and 
and insufficient; no accurate but limited; extensive knowledge; 
background some background excellent background 


‘aes ees Fl oe sales a | 











———=== 








ae = RS 8 or Ae Bor = 
— ay; . s 





66 The Journal of Educational Psychology 


relative importance of items, their definitions of them will vary. These 
factors, if they exist, will tend to influence the judgments on all items 
of the scale. 

Still another factor lowers the validity of a scale if many ratings 
are to be combined, and that is the tendency on the part of some 
teachers to use a part of it only. On a twelve point scale, some 
teachers use the upper half only: others avoid the extremes, while still 
others spread their judgments over the entire available range. These 
statements refer only to many and not to individual ratings. 

In an effort to reduce the influence of these factors on the Univer- 
sity High School Rating Scale, various techniques of combining the 
ratings were employed. 


THE SUBJECTS 


The subjects were forty-one student-teachers in the College of 
Education, University of Minnesota, whose major teaching subject 
was English. They did student-teaching during the year 1933-1934, 
and were rated during the same year. 


THE DATA 


The data were scores on qualifying examinations, Miller Analogies 
Test Scores, and student-teaching ratings. The qualifying examina- 
tions were comprehensive examinations normally given at the end of 
the junior year to all students in the College of Education. They 
covered four phases of subject-matter: (1) professional, covering the 
fields of educational psychology, techniques and principles of secondary 
or elementary education, (2) general English (The Columbia Research 
Bureau English Test) (3) high-school content in the major field 
(major 1A) and (4) college content in the major field (major 1B). 
Each examination required about two hours. The Miller Analogies 
Test is purported to measure general mental ability. Besides separate 
scores on the four examinations, a composite score was also computed, 
weighting each test equally, thus considering them as one examination. 
The data on student-teaching were ratings by five critic teachers in 
the English department. Each had rated independently each student- 
teacher with whom he had contact, and a composite rating of each 
student made by all critic teachers in conference. In this manner 
there were two or three ratings for each student-teacher in addition 
to the composite rating. 


fro 


— Wwe 


dai 
of « 


Tal 








“OY sa 


| ema 


\e 


of 


es 


of 


ey 
he 


ch 
id 
B). 
ies 
ate 
ed, 
on. 
in 
nt- 
ach 
ner 
on 


Various Techniques of Combining Ratings 67 


REDUCTION OF DATA INTO WORKING UNITS 


The check marks on the rating scale were turned into numerals 
from one to twelve, according to their position. Mid-point measures 
were used. 

The ratings for each subject were entered after his name on a master 
data sheet. They were then averaged horizontally, to get the average 
of each item, and vertically to get the average of each rating. Vertical 


TaBLE I.—Ranoce, Mean, anp SD or Ratines Mape sy Various Critics 
(Untversity Hicu Scuoou Ratine Scare) 
Teaching Personality—Item II 








— Number of | Range in rating} Mean in rating sD Range in 
cases scale units scale units SD units 
Composite 37 4-12 7.65 1.92 —1.9 to 2.3 
A 16 4-10 6.31 2.02 —1.1t0 1.8 
‘B 16 2-11 6.94 2.75 —1.8to 1.5 
C 10 3-10 6.20 1.99 —1.6to 1.9 
D 23 2-11 6.22 2.62 —1.6to 1.8 
E 3 3- 8 5.00 2.16 








Knowledge of Subject-matter—Item V 








— Number of | Range in rating} Mean in rating sp Range in 
cases scale units scale units SD unite 

Composite 37 5-12 7.95 1.93 —1.5to2.1 

A 16 4-11 7.13 1.00 —1.6 to 2.0 

B 16 4-11 7.00 2.34 —1.3 to 1.7 

Cc 10 4-10 6.50 2.06 —1.2to0 1.7 

D 23 2-12 7.74 2.71 —2.1to 1.6 

E 3 6-10 7.33 1.89 —1.8 to 1.8 





All Judgments Made by a Single Rater 








Rater Number of | Range in rating | Mean in rating sD Range in 
cases scale units scale units SD units 
Composite 1110 4-12 7.72 1.97 —1.9 to 2.2 
A 480 3-12 7.14 2.17 —1.9 to 2.2 
B 480 1—12 6.95 2.73 —2.2to0 1.8 
Cc 300 2-11 6.29 2.20 —1.9 to 2.1 
D 690 1-12 7.05 2.60 —2.3to1.9 
E 90 3-10 6.47 1.97 


























——————— 











68 The Journal of Educational Psychology 


averages were again averaged which gave a single numerical rating for 
each subject, which will be referred to as the “‘average rating.” 

Since critic teachers varied in their use of the scale as to the dis- 
tribution of their checks, combining by means of raw scores assumed 
equality which did not exist. An examination of the data presented 
in Table I reveals the extent to which the critics varied. In order to 
equate the ratings, they were turned into standard scores. Two 
methods were found to be available for this treatment (1) standard 
deviations computed from items II and V only, and (2) standard 
deviations from all judgments made by a single rater. 

In computing the SD from items II and V only, a frequency dis- 
tribution was made for each critic for item II, V, and the average- 
teaching personality, knowledge of subject-matter, and the average of 
ten items for each subject, respectively. From the frequency dis- 
tributions, SD’s and means were computed, and the items resolved into 
SD units from the mean. Items II and V on the rating scale were 
employed because it would have been indeed a very ambitious under- 
taking to have treated all ten items in this manner, and further because 
the five critics (independently) unanimously agreed in the belief that 
item II (teaching personality) was most important of all items in the 
scale and item V (knowledge of subject-matter) second in importance. 

The other method of computing the SD for a given rater, was to 
make a frequency distribution of every judgment made by a given 
critic. This method gives a much larger number of individual judg- 
ments and seemed to be more valid. A critic may use the upper part 
of the scale in rating a few subjects and another part for others. Ifa 
large number of cases is considered, the distribution will approach 
normality. This method of computing SD units will be referred to 
as ‘‘SD from all Judgments.” 

The SD for the composite ratings was much smaller than that for 
the individual ratings, as will be seenfrom TableI. This was expected, 
since averaging tends to restrict variability. For the composite and 
for the rater ‘‘B”’ the lowest divisions of the scale were not used. The 
mean for the composite rating was higher than the mean of the 
individual ratings, showing a tendency to shift scores upward while 
rating as a group. 

When the standard deviation was computed from all judgments of 
one critic, it was found to be about the same as that computed for the 
separate items. The ranges were larger both in terms of units on the 
scale and in SD units. 


e? 
it 


cc 
sc 
ra 


te 
tu 
ra 
cc 


to 








o ke mT 


af o 


for 


nd 
“he 
the 
rile 


3 of 
the 
the 





Various Techniques of Combining Ratings 69 


RELATIONSHIP WITH THE CONDITIONING FACTORS 


Interrelationships were investigated between each of the qualifying 
examinations and a composite examination score on the one hand and 
items II, V, and the average rating on student-teaching on the other. 
For each set of relationships, four coefficients of correlation were 
computed for each type of data, 7.e. raw scores on the twelve-point 
scale, SD from single items, SD from all items, and the composite 
rating. 

With but two exceptions the coefficients of correlations between 
teaching personality and the conditioning factors increased in magni- 
tude from the twelve-point scale, through SD units, to the composite 
ratings (Table II A). In all cases the composite ratings yielded higher 
coefficients than independent ratings. 

For item V, knowledge of subject-matter, there is but one exception 
to the same observation, and that is for the professional examination 


TaBLe II.—RELATIONSHIP OF CONDITIONING Factors WITH AVERAGE OF ALL 
RaTINGs AND ComposITE Ratines oN TEACHING Success, TEACHING 
PERSONALITY, AND KNOWLEDGE OF SUBJECT-MATTER FOR ENGLISH 
Majors 
A. Teaching Personality (Item IT) 





























Deserivti —s Educa- | General| Major | Major —— pr 
ee ee tion | English 1A 1B ; 
score ogies 
Twelve point.......... .O1 .19 —.11 .16 ll 
SD from II only....... 13 .28 .03 . 26 .23 
SD from all J......... .10 .33 .00 31 .25 
Composite rating...... .33 .39 .09 .35 .35 
B. Knowledge of Subject-matter (Item V) 
Twelve point.......... .32 .28 .05 .35 31 
SD from V only....... .30 .35 .09 43 .36 
SD from all J......... 34 .87 .10 46 .38 
Composite rating...... 24 .48 .10 47 .42 
C. Student-teaching Success (Items I to X) 
Twelve point.......... .12 .21 — .09 .15 .18 .02 
SD from I-X totals.... .18 .28 — .01 31 22 
SD from all J......... .14 23 — .07 .28 17 
Composite rating...... .26 .44 .07 .39 . 36 .20 

















70 The Journal of Educational Psychology 


(Table II B). In this case the SD from all judgments is superior, but 
all four r’s are about equal (range .24 to .34). 

When the average rating was compared with the qualifying exami- 
nations, the composite was superior in all cases, the highest coefficient 
being with scores on general English, and the sowest with the major 
1A examination (Table IT C). 

The coefficient of correlation between the average of the ten items 
for the composite ratings and scores on the Miller Analogies Test was 
.20, compared with a coefficient of only .02 when the average of inde- 
pendent ratings (raw scores) was used as the criterion. 

This increase in the magnitude of the coefficients of correlation 
between the conditioning factors and the criteria is attributable most 
likely to the increased validity of the ratings as developed by the 
technique just described. The degree of relationship actually existing 
between the two factors was not altered, but was more accurately 
determined, because the device obviates the fact that one ranks 
consistently high or low and makes provision for placing all ratings on 
a common scale in terms of deviations from the central tendency which 
eliminates the effect of constant errors. 





Sm ROR> twiteH |! 


ct 








\yw 


_ |. Um hUh, 


SOCIAL DOMINANCE OF CLERICAL WORKERS AND 
SALES-PERSONS AS MEASURED BY THE 
BERNREUTER PERSONALITY INVENTORY 


ARTHUR F. DODGE 
Assistant Professor, Industrial Education, University of Illinois 


A comparison of test scores of three hundred clerical workers and 
one hundred fifty-four salespeople indicates significant differences 
between these two occupational groups with respect to social domi- 
nance as measured by the Bernreuter Personality Inventory. 

The data for Table I were secured from the test records of indi- 
viduals who came voluntarily to the Adjustment Service, New York 
City, for guidance. During the year 1933 approximately ten thousand 
applied to this organization for service and from this number it was 
possible to select groups of experienced workers in many occupations. 
Most of the applicants were unemployed at the time the tests were 
administered, but only those were selected for this study whose work 
experience indicated that they had been successful in the given occupa- 
tion. The basis for selection was as follows: (a) At least three years’ 
experience in the given occupations, (b) at least one year of employ- 
ment with a single employer, and (c) a longer experience in the given 
occupation than in any other. 

The sales group of one hundred fifty-four was composed of three 
sub-groups: traveling salesmen, retail salesmen, and retail saleswomen. 


TABLE I.—DoMINANCE-SUBMISSION SCORES OF OCCUPATIONAL Groups! 








Number | Median PE of 

in group score median 
ESE OP Eee 50 +70 5.9 
a ae ae 50 +56 10.6 
IE go cucackosebeeceer hens 54 +45 8.4 
BOONES GERM). vic ccc tb ewe deves 46 +38 7.8 
Stenographers (women).................... 50 +35 8.5 
Office clerks (women)...................... 50 +34 9.5 
Secretaries (womem)..................0000. 50 +30 6.2 
Bookkeepers (men)..................eee00- 54 0 6.9 
Bookkeepers (women)..................... 50 — 2 8.0 

















1 The distribution of scores for each of these occupational groups is shown in 


Table XI, page 36, of Occupational Ability Patterns (see bibliography at end of 
this article). 


71 











72 The Journal of Educational Psychology 


The clerical group of three hundred was composed of six sub-groups: 
accountants (men), stenographers (women), office clerks (women), 
secretaries (women), bookkeepers (men), and bookkeepers (women). 

As indicated in Table I, the median score of each of the sales 
sub-groups was higher than the median score of any of the clerical 
sub-groups. Also, it is interesting to note that the rank of these 
nine sub-groups agrees rather closely with the popular conception of 
the personality of such workers—from the aggressive traveling sales- 
man to the unassuming bookkeeper. 

The median score of the total group of one hundred fifty-four 
salespeople is +55.5 and the median score of the total group of three 
hundred clerical workers is +24.1. The difference between these 
medians is 5.9 times its probable error. 

If the group of traveling salesmen is compared with the total 
group of bookkeepers (both men and women), the difference between 
their median scores is found to be 71.1 or 9.1 times its probable error. 
Critical ratios of such magnitude strongly indicate that these differ- 
ences in median scores cannot be due to chance alone, but are at least 
partially due to a difference existing between experienced salespeople 
and experienced clerical workers with respect to the trait measured. 

It will be noted that there is also close agreement between the 
median scores of the male and female groups representing the same 
occupation—retail salesmen and retail saleswomen with median scores 
of +56 and +45 respectively and men and women bookkeepers with 
median scores of 0 and —2 respectively. 

A report of the University of Minnesota Employment Stabilization 
Research Institute, based on its own extensive testing program, is in 
substantial agreement with the findings reported above:! ‘‘Thus the 
dominance-submission scale alone differentiates significantly between 
occupational groups, and it is noteworthy that in each instance the 
results show the sales group to be significantly more ‘dominant in 
face-to-face situations’ than are the unskilled, the semi-skilled, and the 
skilled groups.” It should be noted in addition that Table II of this 
same bulletin also shows a comparison between the sales group and the 
clerical group indicating mean dominance-submission scores of 64.3 
and 45.8 respectively. The difference between these means is stated 
as 2.4 times its standard error. 





1 University of Minnesota Employment Stabilization Research Institute, 
Bulletin Series, Vol. III, No. 4, page 36. 


te 


St 


P. 








on 


he 
en 
he 


he 
she 


4.3 
bed 


ute, 


Clerical Workers and Sales-persons 73 


The significant differences noted above should not lead one to the 
conclusion that therefore the dominance-submission score is a valuable 
instrument for vocational guidance purposes. Many factors not 
included in this study are involved in making such an assumption. 
Below are some of the questions which must first be answered: 

Are the differences noted in this study due to the effects of the 
occupational environment? 


Is the possession of a large amount of the trait measured favorable 
to success as a salesman? 


How does the possession of a large amount of this trait affect the 
success of a clerical worker? 


CONCLUSIONS 


The dominance-submission score of the Bernreuter Personality 
Inventory appears to measure some trait or characteristic which 
experienced salespeople tend to possess to a greater degree than 
experienced clerical workers. 

No assumption should be made, however, as to the value of the 
Personality Inventory as a basis for vocational guidance although 
further study seems to be warranted. 


BIBLIOGRAPHY 


Dodge, Arthur F.: Occupational Ability Patterns. Teachers College, Columbia 
University, Contributions to Education, No. 658, 1935. 

Paterson, Donald G.: Research Studies in Individual Diagnosis. University of 
Minnesota Employment Stabilization Research Institute, Vol. ITI, No. 4 
(1934). 


a 
7 
ce 
Ors 
ee 
bs 

Ss 


z 
4 
r 
a. 
g) 
4 
ri 
” 
i 


> 
are een eat ~ 
Sly tng i Cone 


See ee en eo oes 


nn gh names ee a ale ae " 
Ee a 


5 9 Oe PP 


_ OL 16 w 


= 


a A 4 oe ver or —> * 
ee + al Fe 
er eee ET ee ana 

A ner 


OR er ee 


pao ake lownteeae 
SS Se 


——— 
~ = = 





Yee ae “— 


' 
' 
; 
| 


eS 2 ee 








BAFTA 





te « OE pee eh Gabon ee acces Ln eae was” on o eee ae 
* . : > d ae 5 age wae a > wr Se * 
a a ee os te AF Ai hee? oe —— ~~ zy we a, < 
z eee ae 4 : % - *8 ; ‘ 
LS I | SORE De OR eee OE dl » Pee pee 


BOOK REVIEWS 


WriuiaM Ernest Hinricus. The Goodenough Drawing in Relation to 
Delinquency and Problem Behavior. New York: Columbia 
University Press, 1936, pp. 82. 


The performance which eventuates when one is given pencil and 
paper and told to draw something—or anything—has long been the 
subject of study by psychologists and psychiatrists. In their efforts 
to discover predictive criteria of one sort or another, clinical psycholo- 
gists have devoted much attention to children’s drawings, an interest 
which was brought to a focus in 1926 when Goodenough published a 
quantitative scale which purported to be a drawing “‘test”’ of intelli- 
gence. The child is instructed to draw a man, with only the one 
restriction that it be a whole man. The score is obtained objectively 
and is believed to be demonstrably free from the influence of artistic 
talent or of practice in drawing. In testing feeble-minded children in 
the New York City Children’s Hospital, Hinrichs found the Good- 
enough Drawing Test to parallel closely the Binet; but two years’ 
experience with it at the Connecticut School for Boys, an institution 
for delinquents, convinced him the drawings were not ‘‘gauging the 
intelligence of the delinquent boys,” although he believed they were 
indicating something. ‘Thus the question arose—as it has arisen with 
almost every other test—what is being measured? 

In this monograph Hinrichs makes no attempt to answer the ques- 
tion directly. He does, however, describe a very sane and rather 
carefully controlled exploratory investigation of the Goodenough test 
in relation to delinquency. During the course of this study he obtained 
some evidence, at least, of what the test does not measure. The 
method of correlation indicated but a weak relationship to ‘‘general 
intelligence”’ and to Developmental Age, though the author does seem 
to believe the latter relationship more significant. He concludes, 
‘Correlations of the scale show drawing score to be only moderately 
connected with ‘general intelligence’ at best, and to be closely enough 
related to maturity as measured by the Furfey scale for Developmental 
Age to indicate that it measures an aspect of development at least 
distinctly ‘tinged’ with a non-intellectual maturity.” 

74 


eo DOoODmoe * ae 4Qo. @® DW e& 


ee t,o = = 


Load ya fF «& £4 -—- of DD 


—_—s= FFU! Oe 








‘ic 
d- 


on 
he 
re 
ith 


er 
est 
ed 
‘he 
ral 
em 
les, 
ely 
igh 
ital 
ast 





Book Reviews 75 


Hinrichs’s investigation, however, was principally concerned with 
two other problems: (1) The possibility of extending the Goodenough 
scale and scoring technique for use with children above the Good- 
enough age limit of ten years; (2) the quantitative and qualitative 
differences between the drawing performance of delinquents and 
various control groups in which such factors as academic ability, 
intelligence, economic level, home background, and conduct disorder 
were held constant. 

To the first problem his results supply an affirmative conclusion. 
“Extension of the Goodenough Drawing Scale to ages above twelve 
to thirteen years has been found desirable and possible. Additional 
scoring points have been defined which constitute the beginning of 
such an Extension, and further possible points and scoring methods 
are suggested.” | 

The performance of delinquents was found quantitatively inferior to 
that of other control groups, though the differences were not sufficiently 
great or consistent to be more than a strong indication. There 
appeared also to be some suggestion of an association between behavior 
problems and low drawing scores. Qualitatively, however, the draw- 
ings of the delinquents presented stronger indications of inferiority. 
Greater juvenility was clear and more frequent instability suggested. 
Incongruity of the drawings seemed to offer the most satisfactory 
basis for qualitative classification. 

The investigation is a well-conceived and interesting exploratory 
study; it is, perhaps, more suggestive of future research than produc- 
tive of immediate results. The author demonstrates a real apprecia- 
tion of the scope of the problems involved and of the necessary 
controls. He is possibly a little more of an enthusiast for the drawing 
test than some of his readers would wish, yet generally sound and 
conservative in his conclusions. He believes that “‘the Goodenough 
drawing-a-man test does measure some sort of maturity and that 
in it we have an instrument which promises to be useful in understand- 
ing certain behavior problems,” but that delinquency is too complex 
for any such test to be capable of measuring, diagnosing, or predicting 
it. ‘‘The drawing test, used in group surveys, will be found not to be 
one hundred per cent accurate; it will pass over some who should be 
caught; it will catch some who should not be caught; but it should be a 
material help in locating and understanding incipient and present 


behavior problems.”’ CARLETON F’. SCOFIELD. 
University of Buffalo. 


ie ot. Saar os 
= 
‘3 














> 
aster ad 


i 


os Heat ee ae 


a nae See niin. 
een eee 





RE SOR 
ore et = 


~~ ~ ae 


a 


76 The Journal of Educational Psychology 


VERNON JONES. Character and Citizenship Training in the Public 
School. Chicago: The University of Chicago Press, 1936, pp. 
XI + 401. 


Dr. Jones’ book is a noteworthy contribution to the literature 
dealing with experimental studies in the field of character training. 
The study which he reports was conducted in an effort to determine 
whether or not measurable improvement in character traits can be 
brought about by classroom instruction; and to determine the relative 
efficacy of three specific methods of instruction in producing such 
improvement. 

The three methods which he attempted to compare were the 
‘‘first-hand-experiencing’’ method, in which various concrete situa- 
tions were responded to by the pupils; the “‘discussion”’ method, which 
afforded no opportunity for “‘first-hand-experiencing,” but consisted 
exclusively of discussion of moral principles and behavior in a wide 
variety of situations; and the ‘‘experiencing-plus discussion”’ method, 
which was a combination of the other two methods. 

Three fairly well equated groups, each consisting of one seventh- 
and one eighth-grade class, were subjected to a nine-months period of 
training, one group according to each method, while a similar group 
served as a control and received no training. All groups were tested 
at the beginning and at the end of the training period, on several 
measures of honesty, coéperation, and moral knowledge, e.g., the Maller 
Coéperation Test, the Maller Self-Marking Test, and the Jones 
Attitude Comparison Test. 

Jones’ findings, while they definitely indicate the possibility of 
improvement of specific character traits by classroom procedure, 
nevertheless emphasize the extreme difficulty of producing such 
improvement to any great extent. He found the “experiencing-plus- 
discussion ”’ method to be in general the most effective. In comparing 
the three methods, the problem of specificity vs. generality of character 
traits inevitably arose. The author adopts a middle-of-the-road 
position. ‘‘Neither thorough-going specificity nor generality alone 
seems an adequate principle to explain the organization of char- 
IP i 

Abundant information is included in the book concerning such 
topics as mental ability, home background, church attendance, motion 
pictures, and other factors in their relationship to character and 
behavior. Highly commendable is the careful analysis and cautious 





Pa ee ee ee ee ee 








sh 
on 
id 
us 


Book Reviews 77 


interpretation of the numerous statistics which the author has com- 
puted, particularly the correlation coefficients. The tentative nature 
of certain of the conclusions, due to the small numbers of cases on which 
they are based, and to the low reliabilities of some of the tests, is fully 
recognized. In view of the scarcity of accurate experimental findings 
in this field and of the growing demand for such data, this volume is 
especially welcome. Roger T. LENNON. 
Fordham University. 


Cyrit Burt. The Subnormal Mind. London: Oxford University 
Press, 1935, pp. 368. 


It was very significant that a professor of psychology should be 
invited to deliver the third series of the Heath Clark Lectures at the 
London School of Hygiene and Tropical Medicine; and that so sound 
and highly respected a psychologist as Cyril Burt should be accorded 
this honor was indeed fortunate. Discussion of the subnormal mind, 
and more particularly of the diagnosis and treatment of subnormal 
school children, inevitably led Professor Burt into fields where contro- 
versy between medicine and psychology has been most frequent. 
When he announces his intention to restrict himself ‘‘to those forms of 
subnormality which seem to be chiefly mental in their origin,” he 
immediately, no doubt, prejudices his case in the minds of many of his 
hearers. Yet he makes no concessions, and in the early paragraphs of 
his first lecture accuses the medical student of too often knowing his 
cerebral topography by heart while being blind ‘“‘to all that is most 
characteristic in the psychology of man.”’ Such a student of medicine, 
and he might well have included many students of psychology, forgets, 
says Burt, ‘that the reduction of conscious processes to material 
processes means begging an unsolved metaphysical question, and that 
those who maintain that mental phenomena must in the last resort be 
interpreted in physical terms are committing themselves to a hypo- 
thesis that the physicist of today would be the first to discountenance.”’ 
And later, in discussing the neurotic, he declares, ‘‘The picture of the 
mind as nothing but a peculiarly intricate machine may serve at times 
to illustrate, but never to explain.”” Underlying the author’s entire 
discussion of the subnormal mind, therefore, is the conviction that 
‘descriptions, diagnoses, and even treatment within this field must 
remain psychological rather than physiological.” 

In order to examine the subnormal mind in proper perspective, it 
is essential that one clearly envision the normal. This background the 


$b 
h 


4 

4 

¥ 

Fe; 
ret 
A! 
a! 
ae, 
ea 
4 
a 
Sire 
" 7 
b iait 
rae 

i iM 
Re 


Fests. 


- ne 
5 ee es 


Se 


— 2 


Rt ees eS ee talline SNE uae ¥ SS 


a 
si 





: aera. eae 


oe 


“pes ee 

a , ee rem es 
ates * : . 

ty a ee OS ee a We ; 


on Se aatiiites dimer ee oo 

gh Lies oe re OR a 

- - Seyret ee e 
eS i oe a oe oe 


no 
ihn man ite to ee 


GAP yaa 
= oe 


inti 


berks, 


Pe A eS 


fa: 


ee, 
a oe Soe 
oe on 


ore ~ pw es 


> 1 Re eS a ee 


— 


x — 











—— 
a pe a 


ss. a COS, 
sens - hi <n eee, 


ae 


~ 


aon rs pS 
Seta ST ee 


‘ws <s — aN 

SES Ie Oe: NEE 

Puy: rg ea a: « 
e 


bs 


y =Y, See 
ent 


a ‘Ae 








oe tae —— - ee Pua ern PO ue 9 ~. 

pte ES: _ it, Ae ee so Tes Se mye me =: Ee 
Poe Fe a — ~~ 2 A - bs? <= s are ee BOW 
vn a ue. te . SS PERN Se aare2e ae eae cae 
OS St Me alla, aie -~4 3 = . a ee —————— 


78 The Journal of Educational Psychology 


author supplies by devoting his first lecture to ‘‘The Normal Mind.” 
Therein he presents an extraordinarily comprehensive, yet concise, 
survey of the nature of the normal mind, along with a description of the 
methods available for investigating its more normal characteristics. 
The remainder of the lectures are devoted to The Mentally Deficient, 
The Dull or Backward, The Delinquent, The Neurotic, Asthenic 
Neuroses, Sthenic Neuroses, and the ascertainment and incidence of 
neuroses. In each case diagnosis, causes, incidence, and treatment 
are described and critically discussed. Samples of intelligence tests, 
tests of educational attainment, and neurotic questionnaires constitute 
the appendix. 

One fundamental point of view pervades the author’s discussion. 
Mental deficiency, dullness and backwardness, delinquency, neurosis 
are all manifestations of an inherited tendency to mental subnormality 
in any and every direction, possibly a form of ‘‘neuropathic diathesis,”’ 
though Burt prefers to avoid the implications of this latter term. 
‘The real issue, as I have already urged, is not the limited one of mental 
deficiency but the larger one of mental inefficiency. Certifiable 
defectives form but the fringe of a much larger portion of the popula- 
tion—the portion which is sometimes termed the ‘social problem class,’ 
and includes the dull, the backward, the unemployable, the habitually 
delinquent, in fact all who are subnormal in whatever direction.” 

In the light of this general thesis Burt’s insistence upon the inherit- 
ance of deficiency acquires a slightly different connotation, but he 
remains a staunch defendant of the ‘‘nature”’ point of view. Intelli- 
gence is “‘inborn, general, intellectual ability,’ and the diagnosis of 
the defective demands concern with these three factors. The defect 
must be inborn and not acquired, general and not specific, demon- 
strably due to intellectual rather than temperamental weakness. 
‘Throughout, the main points to be established are whether the 
patient displays such inefficiency in his daily adjustments as to render 
him a case for administrative action, and, if so, whether the inefficiency 
is due, not directly to physical or environmental handicaps, but to an 
incomplete development of mind.” ‘The author is not particularly 
alarmed about the apparent increase in deficiency, which he attributes 
more to rapid breeding than any other cause. He insists, however, 
that we shall never defeat the deficiency problem so long as we consider 
it an isolated matter, rather than part of a more general social problem 


of subnormality which must be attacked simultaneously from several 
angles. 





Book Reviews 79 


The author’s treatment of delinquency gives more scope to the 
influence of environmental factors upon the individual. ‘The 
delinquent with a pedigree of vicious or criminal relatives is the excep- 
tion, not the rule.” ‘In the main, it will be observed, the whole 
investigation of environmental factors endorses the conclusion 
that I have already put forward: Namely, delinquency arises not 
simply out of the delinquent’s own personality, but out of the total 
situation.’”” However, great stress is laid upon intelligence, and more 
particularly upon ‘“‘temperamental deficiency,’”’ as inherited predis- 
posing causes. 

The lecture on the neurotic and the subsequent discussion of 
numerous types of neuroses are a valuable contribution to the clarifi- 
cation of this very confusing variety of subnormality. He makes no 
concessions to his medical audience as he distinguishes between the 
organic and the functional disease. He objects to classifying neuroses 
as “functional nervous diseases’’ because they are not diseases, have 
nothing to do with the structures called nerves, and to call them 
functional merely “‘encourages the physician still further to renounce 
all effort at direct investigation and treatment.’”’ Neurotic disorders 
are a special form of mental subnormality. They are distinguished 
from some other forms by being acquired and characterized chiefly 
by emotional origin and symptomology. The author’s distinction 
between the neurotic and the delinquent is worth quoting. ‘Both 
are essentially the result of personal maladjustment. The distinction 
between the two is largely superficial: It is simply that delinquency 
manifests itself most conspicuously by moral disorder, while a neurosis 
manifests itself most conspicuously by an emotional disorder. In 
both, the patient’s attitude towards other persons and towards society 
is fundamentally disturbed; but in a neurosis the disturbance usually 
follows lines which society does not regard as illegal. What is affected 
is primarily the peace and efficiency of the patient, not the peace and 
property of other persons.” 

The final lecture on ‘‘ Ascertainment and Incidence”’ does not pre- 
sent a particularly pleasant picture. Apparently this is truly a 
neurotic age, and the end is not yet. Here social medicine faces its 
most perplexing problem. However, the author is not disheartened— 
“‘Could we make an intensive attack on these cases in early life, 
through child-guidance clinics or similar means, we should, I am 
convinced, not only save countless individuals from definite breakdown 
in later life, but enormously diminish the amount of unhappiness, 





= 
~~ 


eS TT 


Ee 9 yee 
Ree te 


——————— 
—— en 








$4, 
i 
bs 
i 
< 








80 The Journal of Educational Psychology 


inefficiency, and social friction that such conditions eventually 
engender.” 

The Subnormal Mind is a valuable contribution to psychological 
literature. It is not a book of sufficient substance for the student of 
clinical or abnormal psychology, but it was not intended for such a 
reader. The student of general psychology and the informed layman 
will find it interesting, informative, and provocative of thought on 
social problems. The medical student and physician should find 
it most enlightening and valuable. It is written from a generally 


orthodox psychological point of view, presents interesting original 


data, clarifies certain concepts, draws generally sound conclusions, 
emphasizes important psychological and medical problems of social 
significance, and tells the student of medicine in no uncertain terms 
many things he should have been told long ago. 


CARLETON F. ScoFie.p. 
University of Buffalo. 


A CORRECTION 


The article, ‘‘A formula for intercorrelations among multiscores”’ 
in the September issue of this Journal, should have the following 
corrections made: 

vy should be r in the left-hand member of each of the formulas for 
correlation—Formulas (4), (5), (8), (11), (14), and (15), pp. 458-461. 

Read Formulas (4) and (8) as though the fraction line had not 
been broken, and as though the two parts of the numerator and the 
two parts of the denominator had not been separated. 

pig: in Equation (7) should not be subscripts; the equation will 
then read: 


> a hs = No}? = Noi (7) 


EvMeEr B. Rover. 





WELD . x . ° = 
: ne cena = ll ne diy 








7) 





